Normalization of a multi-dimensional set object

ABSTRACT

Methods and apparatus, including computer systems and program products, for normalizing computer-represented collections of objects. A first minimum value can be normalized based on a second minimum value of a universal set object that corresponds to the first set object. The second minimum value is both a minimum value supported by a data type (e.g., 1-byte integer) and a minimum value defined to be in the universal set object (e.g., 0 for a universal set of all natural numbers). Similarly, a first maximum value can be normalized based on a second maximum value of the universal set object where the second maximum value is both a maximum value supported by a data type and in the universal set object. Intervals can be normalized, which can involve replacing half-open intervals with equivalent half-closed intervals. Also, a consecutively ordered, uninterrupted, sequence of values of a set object can be normalized.

CROSS-REFERENCE TO RELATED APPLICATION

This application in a continuation-in-part of, and claims the priorityto, U.S. application Ser. No. 10/955,749, filed Sep. 30, 2004, entitledMULTI-DIMENSIONAL SET OBJECT.

BACKGROUND

The following description relates to systems, techniques, and computerprogram products for machine-implemented representations of a collectionof objects.

Set theory is the mathematical theory of sets, which representcollections of objects. It has an important role in modem mathematicaltheory, providing a way to express much of mathematics.

Basic concepts of set theory include set and membership. A set isthought of as any collection of objects, called members (or elements) ofthe set. In mathematics, members of sets are any mathematical objects,and in particular can themselves be sets. Thus, the following can bereferred to as sets: the set N of natural numbers {0,1,2,3,4, . . . },the set of real numbers, and the set of functions from the naturalnumbers to the natural numbers; but also, for example, the set {0,2,N}which has as members the numbers 0 and 2, and the set N.

Some fundamental sets include the empty set, the universal set, and thepower set. The empty set represents that no members exist in the set,the universal set represents all sets in a given context, and the powerset of a set U (i.e., P(U)) represents the collection of all subsets ofa given universal set U. For two sets, a Cartesian Product set can bedefined as a set of all ordered pairs whose first component is anelement of a first set and whose second component is an element of asecond set.

Some set operations, i.e., operations that are performed on sets,include equality, containedness, complement, union, and intersection.Equality is an operation used to determine if two sets are equal (i.e.,all members of one set are in another set and vice versa); containednessis an operation used to determine if one set is within the bounds ofanother set; complement is an operation used to determine in a givencontext of a universal set the set of members that do not belong to agiven set; union is an operation used to determine a set includingmembers of two sets; and intersection is an operation used to determinea set including members that are common to two sets.

SUMMARY

Described herein are systems, techniques, and computer program productsfor machine-implemented representations of a collection of objects.

In one general aspect, the techniques feature a method of representing acollection of objects in a computer system. The method includesproviding data structure definitions that define a set object torepresent the collection of objects; and generating, with acomputer-implemented constructor using the one or more data structuredefinitions, a set object representing the collection of objects.

Implementations may include one or more of the following features. Thedata structure definitions may define the set object to be aone-dimensional set object including one or more ranges of elements. Thedata structure definitions may define the set object to be aone-dimensional set object including a first list of knot elements, asecond list representing the knot elements that the one-dimensional setincludes, and a third list representing elements, other than the knotelements, that are included in the one-dimensional set. The first listmay be a one-dimensional vector of values corresponding to the knotelements of a one-dimensional set. The second and third lists may bedefined to be alternating elements of a one-dimensional bit vector wherethe third list is defined to indicate whether the one-dimensional setincludes a range that is bound by one or more knot elements. The secondlist and the third lists may be separate one-dimensional bit vectorswhere the third list is defined to indicate whether the one-dimensionalset includes a range that is bound by one or more knot elements.

The data structure definitions may define the set object to be amulti-dimensional set object including a union of blocks of a partition,where each block defines a disjoint collection of the objects. Themulti-dimensional set object may be defined to include a first list ofknot elements, a second list representing blocks corresponding to theknot elements that the multi-dimensional set includes, and a third listrepresenting blocks corresponding to elements, other than the knotelements, that are included in the multi-dimensional set.

A multi-dimensional set object may be defined to include a CartesianProduct of two or more dimensions of the collection of the objects. TheCartesian Product may be defined to include two references to setobjects, where a first reference corresponds to a first Cartesian Factorand a second reference corresponds to a second Cartesian Factor. In thatcase, the references are ordered such that the references define aCartesian Product of the collection of the objects. A multi-dimensionalset object may include set objects nested within the multi-dimensionalset object.

A multi-dimensional set object may be defined to include a union set. Inthat case, the union set includes blocks of a partition such that eachblock defines a disjoint collection of the objects and each block is areference referring to a one-dimensional set, a union set, an empty set,a universal set, or a Cartesian Product set. In that case, the CartesianProduct set includes a Cartesian Product of a first Cartesian Factorbeing a one-dimensional set and a second Cartesian Factor being areference referring to a union set, a Cartesian Product Set, or aone-dimensional set.

A multi-dimensional set object may be defined to include a CartesianProduct set, including a Cartesian Product of a first Cartesian Factorbeing a one-dimensional set and a second Cartesian Factor being areference referring to a union set, a Cartesian Product Set, or aone-dimensional set. In that case, the union set includes blocks of apartition such that each block defines a disjoint collection of objectsand is a reference to a one-dimensional set, a union set, an empty set,a universal set, or a Cartesian Product set.

The one or more data structure definitions may define the set objects asnormalized sets. The generated set object may represent data on astorage medium. In that case, the method further includes receiving aquery; and computing the result to the query, such that computing theresult to the query includes determining whether to access the data inthe storage medium based on the generated set object.

The one or more definitions may define operations for set objects,including, an operation to determine an intersection of two or more setobjects, an operation to determine a union of two or more set objects,an operation to determine whether a set object includes a range, and anoperation to determine a complement of a set object.

In another aspect, a computer program product tangibly embodied in aninformation carrier includes one or more data structure definitionsincluding a constructor method for generating a set object. In thatproduct, the set object represents sets in a computer system.

Implementations may include one or more of the following features. Thedata structure definitions may define a one-dimensional set object asone or more ranges of elements. The data structure definitions maydefine a one-dimensional set object to include a first list of knotelements, a second list representing the knot elements that theone-dimensional set includes, and a third list representing elements,other than the knot elements, that are included in the one-dimensionalset.

The data structure definitions may define the set object to be amulti-dimensional set object that includes a union of blocks of apartition, where each block defines a disjoint collection of theobjects. The multi-dimensional set object may be defined to include afirst list of knot elements, a second list representing blockscorresponding to the knot elements that the multi-dimensional setincludes, and a third list representing blocks corresponding toelements, other than the knot elements, that are included in themulti-dimensional set.

The data structure definitions may define a multi-dimensional set objectto include a Cartesian Product of two or more dimensions of a collectionof objects. The multi-dimensional set object may be defined to includetwo references to set objects. In that case, a first referencecorresponds to a first Cartesian Factor and a second referencecorresponds to a second Cartesian Factor, and the references are orderedsuch that the references define a Cartesian Product of the collection ofthe objects.

The generated set object may represent data on a storage medium. In thatcase, the computer program product includes instruction to cause dataprocessing apparatus to perform operations including receiving a query;and computing the result to the query, such that computing the result tothe query includes determining whether to access the data in the storagemedium based on the generated set object.

The one or more definitions may define operations for the set object toinclude an operation to determine an intersection of two or more setobjects, an operation to determine a union of two or more set objects,an operation to determine whether the set object includes a range, andan operation to determine a complement of the set object.

In another aspect, a computer program product, tangibly embodied in aninformation carrier, includes instructions to normalize a first minimumvalue, a first maximum value, or both the first minimum and maximumvalues of a first set object in accordance with a first process, andperform a set operation on a normalized version of the first set objectthat was normalized in accordance with the first process. In thatproduct, the first minimum value is normalized based on a second minimumvalue of a universal set object that corresponds to the first setobject, and the second minimum value is both a minimum value supportedby a data type (e.g., 1-byte integer) and a minimum value defined to bein the universal set object (e.g., 0 for a universal set of all naturalnumbers). Similarly, the first maximum value is normalized based on asecond maximum value of the universal set object, where the secondmaximum value is both a maximum value supported by a data type and inthe universal set object.

Implementations may include one or more of the following features. Thefirst minimum value of the first set object may be modified if the firstminimum value is the same as the second minimum value.

The first set object may use a combination of knot elements andpartition entries to represent a collection of objects. The firstminimum value may be a minimum knot element of the first set object andthe instructions to modify the first minimum value may includeinstructions to modify the first set object to include a value of asecond partition entry in a first partition entry, where the first andsecond partition entries are in an ordered sequence of partition entrieswith the first partition entry being before the second partition entry.The partition entry may be a partition bit or partition set.

The first maximum value may be a maximum knot element of the first setobject, and the first maximum value of the first set object may bemodified if the first maximum value is the same as the second maximumvalue.

The instructions to modify the first maximum value may includeinstructions to modify the first set object to include the value of apenultimate partition entry in a last partition entry.

The normalized version of the first set object may be further normalizedin accordance with a second process. Normalizing in accordance with asecond process may include normalizing consecutively-ordered elements ofthe first set object in accordance with the second process, where afirst element is in a first ordered sequence before the second elementand the universal set object includes the first and second elements asan uninterrupted second ordered sequence of elements.

The normalized version of the first set object may be further normalizedin accordance with a third process. Normalizing in accordance with thethird process may include normalizing representations of intervals in afirst set object. The intervals may represent a span of objects in thefirst set object. Both he second and third processes may be usedindividually, or in any combination with the each other or the firstprocess to normalize the first set object.

Normalizing consecutively-ordered knot elements of the first set objectmay include removing one of the first element or the second element. Thefirst and second elements are knot elements that have correspondingpartition entries, and removing one of the first element or the secondelement may include removing the first element and a correspondingpartition entry representing inclusion of the first element in the firstset object.

The intervals may be normalized to generate a single representation of asame span of objects across different set objects, and normalizing therepresentations may include replacing a half-open interval with anequivalent half-closed interval. Replacing a half-open interval with anequivalent half-closed interval may include removing a knot element andcorresponding partition entry of the first set object.

The systems, techniques, and computer program products formachine-implemented representations of a collection of objects describedhere may provide one or more of the following advantages.

A set object may be provided that approximates and models real worldcollections of objects. To generate approximations, a certaincompleteness may be provided; in other words, operations on objects of amodel will not lead to objects that cannot be represented by the model;otherwise, problems may occur like the approximation of numbers in acomputer system that may lead to rounding problems, underflowexceptions, and overflow exceptions. To achieve completeness in thesense that set operations such as complement, intersection and unionwithin the model will not lead to sets that the model cannot represent,the set object may be adapted. For example, sets may be normalized undercertain conditions. Efficient algorithms for basic set operations, suchas intersection, complement, and union, may also be provided.

The set object may improve the efficiency and/or provide an intelligentway to express collections of objects for techniques and/or systems thatinvolve near-line storage, planning, and authorization.

The set objects may be normalized to provide a consistent representationof objects in a set. One or more types of processes can be used togenerate different types of normalization. The processes for normalizingset objects may be optimized such that consistent, reproducible resultsare generated when operations are performed on normalized set objects.As an example, for a universal set being the set of all integers, thesets i<4 and i≦3 represent the same integer values; however, they couldeach have different representations in a computer. By providing a methodof normalizing set objects, different representations of these setsmight be changed to one type of representation. Advantageously, aconsistent representation might ensure consistent results and ease theimplementation of operations on set objects. For example, if one desiredto implement an operation to determine if two sets were identical, itmight be easiest to simply check to see if their representations areidentical. However, if the representations differ, there might beinaccurate results (e.g., a determination that two sets are not equal,although only their representations differ) or the operation mightrequire a much more complex implementation that takes into accountinconsistencies between representations of sets. Advantageously, acombination of processes for normalizing set objects might be used as apractical approach to achieving consistently normalized set objects. Forexample, normalizing a set object can include normalizing maximum andminimum values, removing redundant values, consistently representingintervals, and consistently representing consecutive values.

In addition, normalization may simplify user input when a set is definedby a user. For example, a user of a software application can enterdifferent selection conditions to define a same set of objects, and theresultant sets objects can be normalized so that the set objects areconsistently defined within the computer. For example, a user couldenter, for the universal set of integer values n, the condition “nBETWEEN 6 AND 9.” Yet, the computer might generate a normalized setobject and interpret that set object as being equivalent to theconditions “n IN (6,7,8,9),” “n>=6 AND n<10,” or any other variant.

Details of one or more implementations are set forth in the accompanyingdrawings and the description below. Other features and advantages may beapparent from the description and drawings, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

These and other aspects will now be described in detail with referenceto the following drawings.

FIG. 1 illustrates a model of representing a set as a set object.

FIGS. 2A through 2D illustrate various representations of collections ofobjects.

FIG. 3 illustrates a model of representing a set as a one-dimensionalset object.

FIG. 4 illustrates a multi-dimensional set object.

FIG. 5 illustrates a model for representing a set as a multi-dimensionalset object.

FIG. 6 is a flowchart of a process of calculating the complement of aCartesian Product set object.

FIG. 7 is a flowchart of a process of calculating the complement of aunion set object.

FIGS. 8A and 8B include a flowchart of a process of calculating theintersection of two one-dimensional set objects.

FIGS. 9A through 9C include a flowchart for calculating the intersectionof two multi-dimensional set objects.

FIGS. 10A, 10B, and 10C include flowcharts illustrating processes forperforming an equality check on a one-dimensional set object, aCartesian Product set object, and a union set object, respectively.

FIGS. 11A-11E include a flow chart of a process of normalizing a setobject.

Like reference numbers and designations in the various drawings indicatelike elements.

DETAILED DESCRIPTION

The systems and techniques described here relate to machine-implementedrepresentations of a collection of objects.

Overview

A model for representing sets as set objects may be described in thefollowing pages in accordance with the following chain of subsets in anarbitrary universal set U:C⊂S⊂T⊂U

In that chain of subsets, U is a given universal set that need not befinite and is not necessarily completely representable on a computersystem (e.g., U could be the set of real numbers); however, the elementsof U are ordered. T will be an approximation of U that has arepresentation in a certain programming language and on a certainhardware platform (e.g., T could be the set of all different valueswhich can be stored in a variable of data type float). S can be the setof all values that are allowed by an application, developed with theprogramming language, as possible input. The elements of S need not bestored and enumerated until the first time they are used. Finally, C isthe set of all values that are in current use at a certain time (e.g.,persistently or transiently) by the application.

Data structures that represent sets within the chain of subsets can bedescribed as representations of a closed subset M of the power set ofP(U). “Closed” means that for each element of M, the complement withrespect to U is also an element of M. In addition, for two arbitraryelements of M, their intersection and their union are elements of M.

The four kinds of sets (C, S, T, and U) can also be described with howthey typically can change over time. For the universal set U, there isan assumption that the elements of U are fixed in time. The set T canchange in time when the data type of T is replaced by a new one whichincludes all values of the previous one and which is still a subset ofthe universal set U (e.g., changing the data type of a certain domainfrom 4-byte integer to 8-byte integer, or from float (single-precision)to double, or increasing the length of a character domain (the data typeof a character domain and the concept of domains are concepts fromAdvanced Business Application Programming (ABAP); ABAP specificationavailable from SAP AG of Walldorf (Baden), Germany). The set S can alsochange over time, and can change more frequently than T. For example,the set S can change when an application's logic for checking user inputis changed (e.g., if an application accepts a larger range of input). Insuch a scenario, a change of S must not exclude values of C. C can bethe most frequently changed set (e.g., elements (or values) can be addedto C whenever new values are entered into the system as input).

Representation of Sets

In general, if all possible subsets for a given universal set U are tobe included in a computer representation of sets, all elements of thepower set P(U) of U should be considered. If U cannot be represented ona computer system, then P(U) cannot be represented as well. A certainsubset of P(U) should be defined such that the subset can be representedon a computer system and the subset can have the property of beingcomplete, as described earlier. This should also be the case when thereexists one subset of P(U) which has the property of completeness and hasthe same property of completeness for all universal sets U (i.e., alltypes of universal sets, where each set may have a different context).An example of a subset of P(U) for an arbitrary universal set where U iscomplete with respect to basic set operations is the set that includesthe empty set Ø and the universal set U (e.g., a set A={U, Ø} (i.e., ingeneral, a representation should be a subset of P(U) such that therepresentation still forms a proper Boolean Algebraic representationwith respect to intersection and union operations).

In developing a representation of sets, the above model can be used as abasis for a one-dimensional representation of a set and amulti-dimensional representation of a set can include nested sets. Aninitial model may include simply the empty set Ø and the universal setU. Within this, the basic set operations are well defined and completefor a subset of the power set. Each set is the complement of the other;the intersection is the universal set U if the operands are theuniversal set and the empty set; and the union is the empty set if theoperands are the empty set and the universal set.

FIG. 1 illustrates a model of representing a set as a set object. In themodel, a set object represents a set according to the initial modeldescribed above, and includes a single class called Set 105 (i.e., asingle data structure definition). The creation of instances (e.g.,creation of instances via a constructor method) is private, where theclass Set 105 has a static constructor 110 that can be used to createinstances of sets that can be stored in private static attributes EmptySet 115 and Universal Set 120, where Empty Set 115 can represent theempty set, and Universal Set 120 can represent the universal set. Afteran instance is created, the public static methods getEmptySet( ) 125 andgetUniversalSet( ) 130 can be used to obtain a reference to the emptyand universal sets, respectively, depending on the type of instance thatis created.

The public boolean methods isEmpty( ) 135 and isUniversal( ) 140 areused to determine whether a set object represents the empty set or theuniversal set. The method isEmpty( ) 135 returns true if a set objectrepresents the empty set (i.e., an instance stored in static attributeEmpty Set 115 represents the empty set) and returns false otherwise.Likewise, the method isUniversal( ) 140 returns true if a set objectrepresents the universal set and returns false otherwise.

The model also includes basic set operations, including, equality,complement, intersection, union, and containedness. The equalityoperation is represented by the public method isEqual( ) 145, anddetermines whether two sets are equal (i.e., whether two set objectsrepresent the universal set or the empty set). Complement is representedby the public method complement( ) 150, which returns a reference to theempty set if the set object represents the universal set and a referenceto the universal set if a set object represents the empty set.Intersection is represented by the public method intersection( ) 155,which returns the intersection of the current set with the set aSet(i.e., the argument of the method). If the set object represents theempty set, the empty set is returned (i.e., regardless of the argument,the result should be the empty set); otherwise, the set is the universalset, so the argument is returned. The union operation is represented bythe public method unite( ) 160, which returns the union of the currentset with the set passed as argument. If a set object represents theuniversal set, the universal set is returned. If a set object representsthe empty set, the argument (i.e., the argument passed to unite( ) 160)is returned. The containedness operation is represented by the publicmethod contains( ) 165, which returns false for the empty set and truefor the universal set.

Representation of a One-Dimensional Set

A second model may be developed based on the initial model discussedabove. The second model includes a set object for a multi-dimensionaluniversal set U that is defined as the Cartesian Product of multipleone-dimensional universal sets U_(d). Having a multi-dimensional set Ucan be used to differentiate among dimensions. Differentiating amongdimensions may be useful even if there are two dimensions that have thesame data type. For example, the dimension can be an abstraction used todifferentiate between domains that have the same data type (and also thesame universal set U) but different roles within an application's datamodel. As an example, two or more key-columns of a relational databasetable may be in a separate dimension although two or more of thekey-columns are of the same data type. To differentiate amongdimensions, a time-independent order of the dimensions may be defined.This order could be defined, for example, by the order of columns in akey (i.e., a primary index), by the alphabetical order of associatedcolumn names in a table, or by an order defined by an application thatuses set objects.

Although multiple dimensions are defined, the empty set and theuniversal set might not be assigned to a specific dimension. Instead,they may be defined to be members of a dimension with a positiveinfinite order. Thus, the dimension of any sets other than thoserepresented by the empty or universal set is less than the dimension ofthe empty set and the universal set.

Dimensions in the second model are identified in the following by aunique index number running from 1 to the maximum number of dimensions.In the case where each set object represents one-dimensional sets,arbitrary universal sets U_(d) (i.e., one-dimensional universal sets ofa dimension d) are defined in accordance with the rules: the elements ofU_(d) are ordered, and at least one non-empty subset T_(d) ⊂U_(d) can berepresented by a data type on a computer system.

The description of the second model will be restricted to the subsetM_(d) of the power set of U_(d) which is given by all elementary setsdefined by the function:Θ(t, R):={u ε U _(d) |u R t} with t ε T_(d) and R ε {≦, ≧},

where Θ( ) represents a function which maps a value t within T_(d) and agiven binary relation R to a subset of U_(d); u is an element of U_(d);“u R t” represents a boolean operation R where u and t are operands; andR includes the operations less than or equal to and greater than orequal to. In addition to the elementary sets Θ(t, R), the set M_(d)contains all sets that can be defined by expressions upon theseelementary sets using complement (with respect to U_(d)), intersection,and union operations. Functions for other binary relations like <, >, =,and ≠ can be derived using the operations of complement, intersection,and union, and can be represented by sets in accordance with thatfunction.

Other relations can be derived from the following functions, such asΘ(t, <):={u ε U _(d) |u<t}={u ε U _(d) |u≧t}=\Θ(t, ≧)Θ(t, >):={u ε U _(d) |u>t}={u ε U _(d) |u≦t}=\Θ(t, ≦)Θ(t, =):={u ε U _(d) |u=t}={u ε U _(d) |u≧t Λ u≦t}=Θ(t, ≧)∩Θ(t, ≦), andΘ(t, ≠):={u ε U_(d) |u≠t}={u ε U _(d) |u=t}=\(θ(t, ≧)∩Θ(t, ≦));where ‘\’ denotes the complement of a set with respect to the universalset U_(d) and ‘

’ denotes the boolean ‘not’ operator.

FIGS. 2A through 2D illustrate various representations of collections ofobjects. FIG. 2A illustrates a ranges table 205, FIG. 2B illustrates aStructured Query Language (SQL) expression 210, FIG. 2C illustrates agraphical representation 215 of a set, and FIG. 2D illustrates a setobject 220 including a vector of knot values 225 and a vector ofpartition bits 230. Each of these represents the same collection ofobjects. Those objects include all numbers less than or equal to 3.1995,and all numbers greater than 6.1995 and less than or equal to 12.2001.The numbers 3.1995, 6.1995, and 12.2001 are an external representationof calendar month values with the format mm.yyyy where mm denotes amonth and yyyy denotes a year. They are ordered according to the year(first) and the month (second).

To represent these objects, the ranges table 205 includes three rows,each of which represents a range of numbers that should be included orexcluded, as indicated by the sign and the option (a ranges table is atable, which for example can be a data structure, that stores ranges ofvalues, where each row indicates a range of values). The first rowrepresents that all values less than or equal to 3.1995 should included,the second row represents that all values between 6.1995 and 12.2001should be included, and the third row represents that all valuesrepresenting 6.1995 should be excluded.

The SQL expression 210 is an expression of a “where” clause. The “AND”and “OR” clauses, and the operators <, >, and <= are used to representrelationships among values.

The graphical representation 215 uses knots and links to represent thecollection of the objects. A knot is a dot along line 235 and a link isa line segment between two knots (or above or below end-knots). A knotand/or a link may be colored to indicate the inclusion of one or moremembers. For example, the knot 240 and the link 245 may be colored toindicate the inclusion of 3.1995 and all members less than 3.1995.

The set object 220 uses a combination of the vector 225 of knot valuesand the vector 230 of partition entries to represent a collection ofobjects. In FIG. 2D, the set object is a one-dimensional set object; inother words, the set object represents a one-dimensional set of objects.The knot values represent values of a set that partition members of theset represented by the set object 220. In FIG. 2D, the vector 225represents knot values as a vector of calendar months (i.e., a data typethat represents calendar months). For example, the knot vector 225 ofthe set object 220 includes the knot values 3.1995, 6.1995, and 12.2001(i.e., March 1995, June 1995, and December 2001).

The entries included in the vector 230 include a first list thatrepresents the knot values the set object 220 includes, and a secondlist that represents elements, other than the knot values, that areincluded in the set object 220. These two types of lists have listelements that alternate in the partition vector 230. In the set object220, each knot value has three corresponding entries in the partitionvector 230 such that corresponding values in the partition vector 230represent whether values below, at, or above a corresponding knot valueare in the set represented by the set object 220. Each entry in thepartition vector 230 is represented by a bit (i.e., 1 or 0) such thatthe entries in the partition vector 230 are also known as partitionbits. For example, the knot value 250 corresponds to partition bits 255,260, and 265. In that example, the knot value 250 represents the value3.1995, the partition bit 255 represents that the set represented by theset object includes elements below 3.1995, the partition bit 260represents that the set includes elements at 3.1995, and the partitionbit 265 represents that the set excludes elements above 3.1995 but below6.1995. The set object 220 can be generated from, for example, theranges table 205. Thus, for example, a user can input values into aranges table and the ranges table can be used to generate a set object.

To represent sets in a consistent fashion, such that elements in a setwould have the same representation in any two set objects, the setobject 220 is normalized. The set object 220 is normalized such that, atleast the following rules are adhered to: there are no redundant knotvalues (i.e., no two knot values are alike), and knot values arerepresented in an increasing order. For every knot the correspondingknot bit must not be equal to the two surrounding link bits together. Ifa knot bit and its two surrounding link bits are equal (i.e., they areeither all 0 or all 1) the knot value may be removed from the knot valuevector together with the corresponding knot bit and one of the two linkbits from the link bit vector (the size of the set is decreased by 1 bythis operation). These instructions can be repeated until the set iseither normalized or all knot values are removed from the knot valuevector. In the latter case, the normalized representation of the set isnot a one-dimensional set, rather the set is either the empty set (ifthe solely remaining link bit is 0) or the universal set (if the solelyremaining link bit is 1). In other implementations, additional and/ordifferent rules may be used to normalize the representation of sets asset objects.

As will be discussed in further detail later, the set object 220 can berepresented in a computer system as a data structure that includes aknot vector, such as the vector 225, and a partition vector, such as thevector 230. For example, in an object-oriented language such as C++, theset object may be represented as a class that includes a vector of floatvalues and a bit vector. In alternative implementations any types ofvalues may be used to represent the vectors. For example, the partitionvector need not be represented as a bit vector. Also, in alternativeimplementations any number of vectors can be used to represent the setobject 220. For example, each of the two lists that are included in thepartition vector 230 may exist in a corresponding vector, such thatthere are two vectors.

Set Object Data Structure: One-Dimensional Set

Referring back to the second model discussed above, a one-dimensionalset S can be a true subset of a dimension's universal set U_(d) and atrue superset of the empty set Ø.

FIG. 3 illustrates a model 300 of representing a set as aone-dimensional set object. The model 300 is similar to a UnifiedModeling Language (UML) diagram and is an example of a data structuredefinition that can be used to generate a one-dimensional set object inaccordance with a model similar to the second model. The model 300includes a class Set 305 (which may be similar to the class Set 105 ofFIG. 1), a class DimensionSet 310, and a variable of data type Dimension315.

In the model 300, an unnormalized representation of a one-dimensionalset is defined by the class DimensionSet 310 to include a knot valuecollection as a sorted array t[ ] 320, where the array t[ ] 320 is asize n and n represents the size of a one-dimensional set represented bythe set object (e.g., for a set S, S.t[i] being elements of T, where iis an integer from 1 to the size n of a set represented by the array).The unnormalized representation further includes a bit array 325 (i.e.,a partition vector that is implemented as a bit array) p[ ] of size 2n+1(i.e., the size of the array depends on the size of the correspondingknot value collection). As an example, if a set object represents a setS, where S includes {2, 3, 4}, the size n would be 3, the size of thearray t[ ] 320 would be 3, and the size of the bit array p[ ] would be7. In the model 300, the method size( ) 330 returns the size of a setrepresented by a set object (e.g., the size of t[ ]=n).

In the array p[ ] 325, the first bit p[1] indicates whether all valuesless than t[1] belong to a set represented by the set object. Because p[] 325 is a bit array, 1 can be used to indicate that all values lessthan t[1] are included in the set represented by the set object and 0can be used to represent that those values are not included. The lastbit p[2n+1] indicates whether all values greater than t[n] belong to theset. All other bits either represent a link (i.e., a range of valuesbetween two knot values) or a knot value, and whether a link or a knotvalue is included in the set. Link bits include bits p[2i+1], where1<i<n−1. Thus, each link bit p[2i+1] indicates whether all valuesgreater than t[i−1] and less than t[i] belong to the set. Knot bitsinclude the bits p[2i], where 1≦i≦n. Thus, each bit p[2i] indicateswhether the knot value t[i] belongs to the set.

Thus, the knot value array t[ ] 320 defines a partition of the universalset U_(d) (i.e., a partition in the sense of set theory such that apartition of the set U_(d) is a set of nonempty subsets of U_(d) suchthat every element u in U_(d) is in exactly one of these subsets) having2n+1 blocks (i.e., disjoint subsets). The blocks for a set S can bedefined byS.Π(1):={u ε U _(d) |u<S.t(1)};S.Π(2i):={u ε U _(d) |u=S.t(i)}, i=1 . . . n;S.Π(2i+1):={u ε U _(d) |u>S.t(i)Λu<S.t(i+1)}, i=1 . . . n−1; andS.Π(2n+1):={u ε U _(d) |u>S.t(n)};

where n is the size of the set, S.t(i) is a method that returns thevalue of the i-th knot (i.e., S.t(i):=S.t[i]), each S. Π( ) represents ablock of the partition, and u is a value within a multi-dimensionaluniversal set U_(d). In that description of partitions, the first block(i.e., S.Π(1)) includes values u, from the universal set, below thefirst knot value (i.e., S.t(1)); each 2i-th block includes only a singlevalue u of the universal set, that is equal to the corresponding i-thknot value; each (2i+1)-th block includes values u, from the universalset, that are greater than an i-th knot value and less than an (i+1 )-thknot value; and a (2n+1) -th block includes values u, from the universalset, that are greater than the n-th knot value.

In the description of blocks, the blocks obey the partition rules:

S.Π(i)∩S.Π(j)=Ø, i≠j (i.e., no blocks overlap); and${\overset{{2n} + 1}{\bigcup\limits_{i = 1}}{S.{\Pi(i)}}} = U_{d}$(i.e., the union of all blocks is equal to the universal set U_(d)).

If a method S.P(i) is introduced such that:${S.{P(i)}}:=\left\{ \begin{matrix}{U,{{S.{p\lbrack i\rbrack}} = 1}} \\{\varnothing,{{S.{p\lbrack i\rbrack}} = 0}}\end{matrix} \right.$

where the method returns either an empty set or a universal set for ani-th block depending on whether an entry in the partition array p[ ] 325for the set S includes 1 or 0, a one-dimensional set S can be expressedas:$S:={{\overset{{2n} + 1}{\bigcup\limits_{i = 1}}{S.{\Pi(i)}}}\bigcap{S.{P(i)}.}}$

Accordingly, the class DimensionSet 310 is defined such that a union ofall intersections of each block and the result of S.P(i) is included inthe one-dimensional set S. Thus, the class DimensionSet 310 is definedsuch that the knot vector defines a partition of the universal set U_(d)and a corresponding vector of indicators whether a block of thepartition belongs to the one-dimensional set represented by the classDimensionSet 310 or not. In alternative implementations the classDimensionSet 310 may define normalized sets, thus, the a constructor 335of the class DimensionSet 310 can guarantee that only normalizedinstances of class DimensionSet 310 are created.

Representation of a Multi-Dimensional Set

FIG. 4 illustrates a multi-dimensional set object 400. The examplemulti-dimensional set object 400 represents a set that includes acomparison of yearly revenues of two companies having differing fiscalyear variants (a fiscal year variant defines a relationship between acalendar and fiscal year, where the fiscal year variant specifies thenumber of periods, special periods in a fiscal year, and how todetermine assigned posting periods). The multi-dimensional set object400 represents a four-dimensional set (i.e., a collection of objects infour dimensions) that includes the dimensions company, account, year,and month, in a specified order (i.e., the order of dimensions may bearbitrarily defined). The first company is referred to as company 0001and the second company is referred to as company 0002; the accountsinclude account identifiers marked in the range of 800000 and 900000;the years include 2003 and 2004; and the months are represented bynumbers ranging from 1 through 12. As an SQL statement, the set would berepresented using the expression: (Company=‘0001’ AND Year=2004 ANDMonth BETWEEN 1 AND 12 OR Company=‘0002’ AND (Year=2003 AND MonthBETWEEN 4 AND 12 OR Year=2004 AND Month BETWEEN 1 AND 3)) ANDAccount≧‘800000’ AND Account<‘900000’.

A multi-dimensional set object, in general (i.e. where there exists morethan one dimension in a model of a collection of objects), is either aCartesian Product of two sets, or an ordered conjunction (i.e., a union)of sets. In FIG. 4, the multi-dimensional set object 400 is built fromone-dimensional set objects (including the empty set and the universalset), and further includes Cartesian Products (e.g., in the form ofCartesian Product sets), and unions (e.g., in the form of union sets).Cartesian Product sets are Cartesian Products of a one-dimensional setand any other type of set that is not the universal set or the empty set(i.e., a Cartesian Product set object, another one-dimensional set, or aunion set; however, the second set should not be the universal or emptyset because the Cartesian Product set including that set would not beconsidered normalized). For example, a Cartesian Product set 405contains a reference to a one-dimensional set 410 and a CartesianProduct set 415. Cartesian Product sets represent a set of all orderedpairs with the first element of each pair selected from a first set andthe second element selected from a second set. Thus, the CartesianProduct set 405 is a set of ordered pairs of elements from the setrepresented by the one-dimensional set 410 and the Cartesian Product set415. A Cartesian Product set is of the dimension type of theone-dimensional set to which it refers (the first one-dimensional set ifthere are two one-dimensional sets). Thus, in the example, the CartesianProduct set 405 corresponds to the dimension account, which is thedimension type of the one-dimensional set 410 to which the CartesianProduct set 405 refers. According to the normalization conditions for aCartesian Product set, the first set (i.e., the one-dimensional set)should always be of lower order than the dimension of the second set.The data structure of a Cartesian Product set includes a list of tworeferences to the sets from which a Cartesian Product is represented.For example, the Cartesian Product set 405 includes a list of tworeferences to the one-dimensional set 410 and the Cartesian Product set415.

A union set is a union of disjoint Cartesian Products. Each CartesianProduct consists of a block of a partition and another set (e.g., knotvalue 435 and a set referenced by reference 440 define a CartesianProduct). As defined in reference to the one-dimensional set, apartition is defined by a sorted vector of distinct knot values and isassigned to a dimension defined for the union set. In reference to theunion set, knot values define a first set of each Cartesian Product. Asecond set of each Cartesian Product should be—for normalizationreasons—of higher dimension order than the union set's dimension. Thus,a union set can include references to any type of set. For example, knotvalues in a union set 430 are in the dimension company, which is thedimension of the union set 430. A data structure of a union set includesa list of knot values and corresponding partition references. The datastructure of a union set can be interpreted as a generalization of thedata structure for the one-dimensional set object (i.e., ageneralization of the class DimensionSet 310, discussed earlier), exceptthe data structure may use a list of references as a partition setvector instead of a partition bit array, however, the list of referencesmay be used similarly to a partition bit array (i.e., a reference canrepresent values above, at, or below a corresponding knot value).

The following model may be used to describe the representation of amulti-dimensional set object. It can be assumed that themulti-dimensional universal set U is a Cartesian Product of a set ofone-dimensional universal sets U_(d); in other words:$U\quad\text{:=}\quad\underset{d}{\times}U_{d}$

In that model, each dimension should be represented by a data type thatcan be ordered. For example, each dimension can be approximated in acomputer system by an elementary data type, or a complex data if acomplex data is can be ordered (i.e., an order can be defined amongvalues represented by the data type; e.g., the elementary data typeinteger is ordered because the values of variables can be ordered, forexample, in increasing number, such as 0, 1, 2, and so on). In addition,each dimension should be uniquely identifiable by, for example, a name.As an example, a column name of a relational database table may be usedto uniquely identify a dimension.

The model may use a combination of one-dimensional sets to represent amulti-dimensional set. For example, the model may embed one-dimensionalsets in a multi-dimensional set. A one-dimensional set S_(d) belongingto a dimension d can be embedded in a multi-dimensional set according tothe following:$S:={\left( {\underset{d^{\prime} < d}{x}U_{d^{\prime}}} \right){xS}_{d}{{x\left( {\underset{d^{\prime} > d}{x}U_{d^{\prime}}} \right)}.}}$

In other words, a multi-dimensional set S should be representable as theCartesian Product (represented by ‘x’ above) of (1) all one-dimensionaluniversal sets U_(d′) (i.e., one-dimensional universal sets that aredimensions other than the dimension represented by the one-dimensionalset) greater or less than those dimensions represented by the set S_(d),and (2) the set S_(d).

To represent this property, values of a one-dimensional set can bestored, without explicitly storing all the universal sets of the otherdimensions in the Cartesian Product, such that using the dimensioninformation for the dimension defined by U_(d) is a tacit understandingthat all other universal sets are Cartesian Products of theone-dimensional set S_(d).

To provide for completeness of representation for multi-dimensionalsets, two additional types of sets, other than the one-dimensional set,may be used: a Cartesian Product set and a union set. A CartesianProduct set C can represent a simple Cartesian Product of aone-dimensional set D, referred to as a first Cartesian Factor, and anarbitrary set F (i.e., arbitrary in the sense it may be aone-dimensional set, or another type of set, such as a Cartesian Productset or a union set) which will be referred to as a second CartesianFactor. The representation will store references of these two objects asprivate components C.D and C.F. For normalization of the CartesianProduct set, the components of a Cartesian Product set should fulfillthe constraints:C.D.dim( )<C.F.dim( ), andCF:≠ØΛ CF≠U,

where CD.dim( ) represents the order of the dimension (i.e., whether adimension is of a higher or lower order than another dimension; theorder of dimensions may be defined arbitrarily or otherwise) of theCartesian Factor D and CF.dim( ) represents the order of the dimensionof the Cartesian Factor F, such that the order of the dimension of theCartesian Factor D should be less than the order of the dimension of theCartesian Factor F; and the Cartesian Factor F should not represent theempty set or the multi-dimensional universal set U.

The model of a Cartesian Product set may further include the definitionof the dimensions of a Cartesian Product set and the definition of knotvalues and partition sets that represent the collection of objects of aCartesian Product set. To define the dimension order of a CartesianProduct set, the dimension order of a Cartesian Product set C can bedefined by the dimension order of its component D: C.dim( ):=C.D.dim( ).

The model of a Cartesian Product set may represent a collection ofobjects similarly to the representation provided by a one-dimensionalset defined above, by using a collection of knot values and a partitionset. The knot values of a Cartesian Product set may be defined by avector of knot values represented by t and knot values may be retrievedby a method C.t( ). A partition set may be a vector of references tosets and values of partitions in the partition set may be retrieved by amethod C.P( ). The knot values and the partition sets, respectively, maybe defined in accordance with:C.t(i):=C.D.t(i), andC.P(i):=C.D.P(i)∩C.F.

In other words, the Cartesian Product set may inherit knot values from aone-dimensional set that represents the Cartesian Factor D, and a valueof the partition set C.P(i) of a Cartesian Product set is the empty setif a corresponding bit C.D.p[i] (i.e., an i-th bit in the partition bitvector of the Cartesian Factor D) of the dimension set is “0”, otherwiseit is the Cartesian Factor C.F.

The size of a Cartesian Product set C may be defined by the size of itsCartesian Factor D (i.e., C.size( ):=C.D.size( )).

According to this model of the Cartesian Product set, arbitraryintersections of one-dimensional sets and Cartesian Product sets can becalculated and the result will be the empty set, a one-dimensional set,or a Cartesian Product set.

The model may further define a union set (with the set types introducedthus far it is not reasonably possible to represent the union of twoarbitrary sets). An unnormalized representation of a union set U may bedefined by a knot value collection U. t[ ] and a partition setcollection S.P[ ]. The knot value collection may be a sorted array U.t[]of size n distinct values of the dimension's data type (i.e., U.t[i]being elements of T, as defined in the one-dimensional case). Thepartition set P[ ] may be a partition set array of size 2n+1, where n isthe size of a set S (which may be retrieved by, for example, a methodcalled S.size( )).

Like a one-dimensional set (defined as DimensionSet 310 above) a unionset can be assigned to a dimension from which the knot values are takenfrom. As in the one-dimensional set, n distinct knot values define apartition on the dimension to which the union set is assigned. Eachpartition has 2n+1 elements called blocks. For each block, which iseither a single-value (i.e., a knot) or an open interval (i.e., a link),a Cartesian Product with the corresponding set from the partition setarray S.P[ ] is calculated and the union of the (per construction)pair-wise disjoint 2n+1 Cartesian Products define the union set.

FIG. 5 illustrates a model 500 for representing a set as amulti-dimensional set object. The model 500 is similar to a UnifiedModeling Language (UML) diagram and provides examples of data structuredefinitions that can be used to generate a multi-dimensional set objectin accordance with a model similar to the models discussed above. Themodel 500 includes a class Set 505, a class Dimension Set 510, a classCartesian Product Set 515, and Union Set 520.

The classes Dimension Set 510, Cartesian Product Set 515, and Union Set520 define a one-dimensional set object, a Cartesian Product set object,and a union set object, respectively. Each of these classes (510, 515,and 520) inherits the general class Set 505, which may be definedsimilarly to the class Set 105 described in reference to FIG. 1. Theclass Set 505 includes methods, such as the methods is empty( ) 506 andis_universal( ) 507, which return a result indicating whether a setobject is the empty set or the universal set, respectively. The classSet 505 further includes methods for basic set operations such as themethods is_equal( ) 501, complement( ) 502, intersect( ) 503, and unite() 504, which correspond to the basic set operations of equality,complement, intersection and union. Those methods may be defined asmethod stubs in the class Set 505, and may be implemented by the classesthat inherit the class Set 505. For example, if the methods should bedefined differently depending on the type of set object, the methodsmight be defined as stubs in set 505 and might be defined for each classthat defines a set object. The method is_equal( ) 501 returns a booleanvalue representing whether a first set object from which the method iscalled, and a second set object that is passed as an argument, areequal. The method complement( ) 502 calculates the complement of the setobject that calls the method. The method intersect( ) 503 calculates theintersection of a first set object from which the method is called, anda second set object that is passed as an argument. The method unite( )504 calculates the union of a first set object from which the method iscalled, and a second set object that is passed as an argument.

As mentioned earlier, each of the classes Dimension Set 510, CartesianProduct Set 515, and Union Set 520 define a different type of setobject. The class Dimension Set 510 defines a one-dimensional set objectthat includes a sorted vector 511 of partition values (i.e., a vector ofknot values), and a vector 512 of partition bits. In combination, thevectors 511 and 512 can represent a collection of objects in aone-dimensional space.

The class Cartesian Product Set 515 defines a Cartesian Product of twoset objects, a first set being a one-dimensional set object (a firstCartesian Factor) and the second set being a set object (a secondCartesian Factor) that is not the universal set or the empty set (e.g.,a union set, a Cartesian Product set, or a one-dimensional set).Although not shown, the class Cartesian Product Set 515 may include tworeference variables (e.g., pointers), one for each Cartesian Factor.

The class Union Set 520 represents a disjoint union of any type of setobjects. To represent this, the class defines a union set object toinclude a vector of knot values 521 and a vector (not shown) ofreferences to set objects that correspond to ranges above or below aknot value, or include a knot value.

A model of a one-dimensional set, a two dimensional set and ann-dimension multi-dimensional set can include several different types ofset objects, including: an empty set object, a universal set object, aone-dimensional set object, a Cartesian Product set object, and/or aunion set object. A one-dimensional universal set (which can be anordered list of distinct values, which might be also be tuples if anorder for these tuples is defined) can be defined with reference to thefirst three types of set objects (i.e., the empty set object, theuniversal set object, and one-dimensional sets having a sorted array ofdistinct values from a subset (all technically representable values) ofthe universal set and a normalized partition bit array (seenormalization conditions for one-dimensional sets).

In a two-dimensional universal set (i.e. the set of all ordered pairs(2-tuples) where a first component is an element of a first universalset that belongs to a first dimension and a second component is anelement of a second universal set which belongs to a second dimension)the following set objects can exist and describe the model: an empty setobject; a universal set object; one-dimensional set objects belonging tothe first dimension (see the description of how a one-dimensional setcan be embedded in a multi-dimensional space by adding a reference to adimension abstraction); one-dimensional set objects belonging to thesecond dimension (i.e., one-dimensional set objects that define thesecond dimension and can be defined by a same data structure as theone-dimensional set objects that define the first dimension), CartesianProduct set objects having a one-dimensional set object of the firstdimension as a first Cartesian factor and a one-dimensional set objectof a second dimension as a second Cartesian factor; and union setobjects. The Cartesian Product set objects might all belong to the firstdimension per each construction. In a two-dimensional space nonormalized Cartesian Product set objects might exist that belong to thesecond dimension. This can be true because normalization conditions candefine that the dimensions have to be ordered in a Cartesian Product setobject and that the second Cartesian factor must not be a reference tothe empty set object or the universal set object. Union set objects candefine all other set objects belonging to the first dimension (i.e.having knot values from the first universal set and a partition setarray which consists of references to the empty set object, to theuniversal set object, and/or to one-dimensional set objects of thesecond dimension; for normalization reasons there can be at least onereference to a one-dimensional set object in the partition set array).

An n-dimensional universal set might be defined as the Cartesian Productof a (one-dimensional) universal set (referred to as the firstdimension) with a (n−1)-dimensional universal set (referring to thesecond up to the n-th dimension) having the following set objects: anempty set object; a universal set object; (n−1)-dimensional set objectsfrom the 2nd to the n-th dimension (excluding trivial set objects suchas the union set object and the empty set object); Cartesian Product setobjects that can be built by combinations of a one-dimensional setobject belonging the first dimension as the first Cartesian factor andnon-trivial (n−1)-dimensional set objects belonging to the 2nd up to then-th dimension as the second Cartesian factor; and union set objectsbelonging to the first dimension having a sorted knot value array ofdistinct knot values from a subset of the first universal set and havingall combinations of all (n−1)-dimensional set objects belonging to thesecond up to the n-th dimension, where the empty set object and theuniversal set object can be elements of the partition set array. As inthe two-dimensional case, there can be at least one reference to anon-trivial set object in the partition set array for normalizationreasons; otherwise the set object is not regarded as normalized and maybe converted to an equivalent one-dimensional set object belonging tothe first dimension.

Although the model 500 includes a certain definition of classes,variables, and methods, alternative implementations may includeadditional and/or different classes, variables, and/or methods.

Normalized Sets: Overview

Both one-dimensional sets and multi-dimensional sets can be normalized.Normalization should ensure that whenever two representations representthe same set within M_(d) (i.e., one dimension of a closed subset, of apower set of U, that is represented by a data structure) that theirrepresentations are equal. In other words, two data structures that aresupposed to represent a same set should have the same representation. Arepresentation of a set S can be defined to be a normalizedrepresentation if the following constraints are true (where each N inparenthesis represents a normalization rule):

(N1) The boolean expression (S.p[2i]=S.p[2i-1])Λ(S.p[2i]=Sp[2i+1]) isfalse for all i between 1 and S.size( ).

(N2) Two subsequent elements of S.t[ ] are distinct. I.e.,S.t[i]<St[i+1] is true for all i between 1 and S.size( )−1.

(N3) There must exist at least one knot value. I.e., S.size( )>0 must betrue.

Where, S.p[ ] represents a partition vector as described earlier, S.t[ ]represents a knot vector as described earlier, and S.size( ) is a methodrepresenting the size of the knot vector of the set S.

If the first constraint (N1) or the second constraint (N2) is notfulfilled a knot is redundant. To normalize such a set, all redundantknots can be removed from a representation (e.g., a data structure) ofthe set S. If the third constraint (N3) does not hold, a set S can benormalized by replacing the set S with the empty set (i.e.,Set.EmptySet) if S.p[1]=0 and by the universal set (i.e., Set.UniversalSet) otherwise. Normalization constraints (N1), (N2), and (N3)may be referred to as, in combination, “standard normalization”.

Beyond standard normalization, normalization conditions could beenforced depending on further properties of U=U_(d) which may include(O1) through (O3), which are defined as follows:

(O1) The minimum element of T (if it exists) is also a minimum elementof U:∀ u ε U(u≧min(T));

(O2) The maximum element of T (if it exists) is also a maximum elementof U:∀ u ε U (u≦max(T));

(O3) Each element of U has either a unique successor (predecessor) or is(if exists) the maximum (minimum) element of U:∀ u ε U

(∃u_(succ) ε U (∀νεU(u≦ν≦u_(succ)

ν=u

ν=u_(succ)))

∀ νε U(ν≦u)); and T must be a well-ordered subset of U.

Satisfying further normalization assists in performing basic setoperations that will be described later. Depending on a data type thatrepresents a universal set U_(d), any number of the conditions may befollowed and further normalization may be satisfied. As examples, Table1 describes types of universal sets in the first column, correspondingdata types in the second column, and corresponding normalizationconditions that may be required such that a set is further normalizedfor purposes of the set operations described later. The data typesdescribed in Table 1 are for the ABAP programming language and maydiffer depending on the programming language. TABLE 1 NormalizationUniversal Set U Data type T properties Real Numbers 

Double- or single- [none] precision float, (binary coded decimal) BCDsInteger numbers ℑ Integer (1 byte, 2 byte, (O3) . . . ); BCDs withoutdecimals Natural numbers

₀ Numeric characters (O1)

(O3) Date values since All date values less or (O1)

(O3) 0001-01-01 including equal 9999-12-31 initial value 0000-00-00Strings Strings (O1) Set with a finite number Time of a day (in hours,(O1)

(O2)

(O3) of elements minutes, and seconds)

To uniquely represent an arbitrary set, in a multi-dimensional setenvironment (i.e., where the universal set has multiple dimensions), aseither the empty set, the universal set, a dimension set (i.e., aone-dimensional set), a Cartesian Product set, or a union set,normalization conditions may be defined. In a multi-dimensional setenvironment, one-dimensional sets may follow the normalizationconditions discussed above. For a union set U of size n, thenormalization conditions may include the following conditions (N1)through (N5):

(N1) For the partition sets the condition:

U.P[2i].isEqual(U.P[2i−1])=true Λ U.P[2i].isEqual(U.P[2i+1])=true

must be false for every knot (i=1 . . . n), where n is the size of theset U and the method isEqual( ) determines whether a partition set isequal to another partition set. In other words, no 2i-th partition setscan be equal to each (2i−1)-th partition set, where a 2i-th partitionset is equal to a (2i+1)-th partition set. If the N1 condition is notfulfilled for the i-th knot, the i-th knot is redundant and can beremoved, as was discussed in the one-dimensional case.

(N2) Two subsequent elements of the knot value array U.t[ ] should bedistinct (i.e., U.t[i]<U t[i+1] is true for all i between 1 and n−1,where n is the size of the set U).

(N3) There should exist at least one knot (i.e., n>0 must be true). If aknot does not exist (i.e., if n=0), U could be replaced by a solelyremaining partition set U.P(1) (i.e., this could be any set whichbelongs to a dimension of higher order).

(N4) Every partition set should belong to a “higher” dimension than thedimension of the union set. In other words, U.dim( )<U.P[i].dim( ) mustbe fulfilled for all i between 1 and 2n+1, where n is the size of theset U and the dimensions of all sets in the multi-dimensional model areordered.

(N5) The logical expression: ∃i, j(1≦i≦2n+1Λ1≦j≦2n+1ΛU.P[i]≠ØΛU.P[j]≠ØΛ

U.P[i]isEqual(U.P[j]) must be true. In other words, all partition setsin the set U that are not the empty set should not be equal alltogether. If this condition is false, a non-empty set can be factoredout from the array of partition sets and the whole set U can be eitherrepresented as a Cartesian Product set (if the factor is not theuniversal set) or a dimension set (if the factor is the universal set).In either case, the knot array should be the same as the knot array ofthe unnormalized union set and a partition bit array should be derivedfrom the partition set array U.P[ ] by setting a bit to “0” for everyempty set and a bit to “1” otherwise.

To uniquely represent sets in a multi-dimensional set environment,general normalization conditions may apply. In general, normalized setsin a multi-dimensional set environment include the empty set, theuniversal set, a one-dimensional set, a Cartesian Product set, or aunion set. Although these different set types have different attributes,common normalization conditions should include (N1) through (N4) asfollows.

(N1) Each set should be uniquely assigned to a dimension. To follow thiscondition, the empty set and the universal set may be assigned to anartificial dimension of infinite order; one-dimensional sets and unionsets may be explicitly assigned to a dimension (i.e., a dimension towhich knot values belong is an explicit attribute of this kind); and adimension of a Cartesian Product set may be derived from the dimensionof its first Cartesian Factor (which may be defined to be always aone-dimensional set).

(N2) Each set has a sorted collection of unique knot values belonging tothe set's dimension. Under that condition, the empty set and theuniversal set may have an empty knot value collection; one-dimensionalsets and union sets may have an explicit knot value array; and, the knotvalue collection of a Cartesian Product set may be the knot value arrayof its first Cartesian Factor.

(N3) Each set has a size. The size should be the number of knot values.

(N4) Each set has an indexed collection of partition sets. Under thatcondition, the size of the partition set collection is always two timesthe number of the knot values plus 1 (i.e., 2n+1). For the empty set andthe universal set there may exist only one partition set, which is theset itself. For a one-dimensional set, the partition set at index i maybe the empty set if the bit at index i of its partition bit array is 0,and the partition set at index i may be the universal set otherwise. Fora Cartesian Product set, the partition set at index i may be the emptyset if the bit at index i of the partition bit array of its firstCartesian Factor is 0, and the partition set at index i may be thesecond Cartesian Factor otherwise.

Extended Normalization of Set Objects

As discussed above, when representing a collection of objects in acomputer, as a set, two set objects may represent the identicalcollection of objects, but they may have different representations(e.g., by not being set objects that contain the same values, such asknot values in a one-dimensional set object). For example, for auniversal set being the set of all integers, the sets i<4 and i≦3 atethe same (as they include the same integers); however, the differentexpressions may result in different set objects. Following that example,a first set object representing the first expression may have a knotarray having a single knot value of 4 and a partition vector thatindicates elements below four, but not including 4, and the numbersabove four are in the set object (e.g., 1, 0, 0). In contrast to thatrepresentation, a second set object representing the second expressionmay have a knot array with the knot value 3 and a partition vectorindicating that the numbers below 3 and at 3 are in the set object, butnot the numbers above 3 (e.g., 1, 1, 0).

Although normalization of set objects need not be performed and acertain amount of non-uniqueness may be acceptable, having a degree ofconsistency across set objects representing a same collection of objectsmay be desirable. Thus, in addition to, or instead of, any of thestandard normalizations discussed above, other types of normalization,also referred to as “extended” normalization, may be performed. Theextended normalization may take into account scenarios not covered bystandard normalization such that consistency is ensured across setobjects that represent a same collection of objects for those scenarios(e.g., a consistent representation of consecutive, uninterruptedsequences of values). The extended normalization may assist inimplementing operations on set objects and those operations may resultin consistent results (e.g., operations on multiple set objectsrepresenting a same collection of objects would have consistentresults). For example, if an operation that determined whether two setobjects were identical was to be implemented, an implementation mightinclude iterating through the knot values and partition values,one-by-one, in a simple fashion, to see if the representations are thesame. However, if the representations differ such that the set objectsdiffer, implementing the operation might be much more complex, or haveinaccurate results (e.g., a result determining that two set objects arenot equal, although their corresponding collection of objects isidentical).

FIGS. 11-11E include a flow chart 1100 of a process of normalizing a setobject. The process of the flow chart 1100 can be used to generate anormalized union set (that is in accordance with the model of amulti-dimensional set object described above); however, the process canbe modified to generate a normalized one-dimensional set object (such asthe one-dimensional set objects described above) and the process can bemodified for other implementations of a union set object. As an example,an implementation for a one-dimensional set object may be deduced byreplacing “partition set” with “partition bit” and “empty set” with “bit0.” Although specific terms are used in the flow chart 1100 thatdescribe implementations relating to normalizing a multi-dimensional setobject, broader terms can be used that include additionalimplementations. For example, the partition set may be described as apartition entry or partition element, either of which can includepartition sets and partition bits. As another example, a knot value maybe described as a knot element or value (e.g., a minimum knot valuecould be a minimum value of a set object).

The flow chart 1100 includes different sub-processes that performdifferent types of normalization and related tasks. In general, thedifferent sub-processes involve setting-up a system to generate anormalized set object (1152); normalizing a first element (first interms of ordering) of a partition vector (1154); normalizing a lastelement of the partition vector (1156); normalizing links, of apartition vector, that represent intervals between knot elements byremoving redundant links (1158); normalizing redundant knots of apartition vector (1117, 1120, 1121); normalizing intervals by replacingopen intervals with closed intervals (1162, 1164, 1166); and returning anormalized set object (1119).

Setting-up a system (e.g., a computer) to generate a normalized setobject (1152) involves receiving a set object to normalize (1101) anddefining temporary variables (1102). In the flow chart 1100, amulti-dimensional set object is structured similarly to themulti-dimensional set objects discussed above; thus, receiving the setobject can involve receiving a vector of knot values k[n] of size n anda partition set array P[2n+1] of size 2n+1 (e.g., a partition vector)(1101). Temporary variables that are set-up include a temporary index ithat is used to iterate through the knot value array k, and a temporaryknot vector v and a temporary partition set vector Q, which are used toreturn a normalized set object (1119). The temporary variables for theknot vector and partition vector are used to generate a normalized setobject by including a selection of elements from the received set objectin the temporary variables, and returning those temporary variables asthe normalized set object.

A first element (first in the sense of ordering) of a partition vectoris normalized because there can be an ambiguity for a first partitionset (which defines the inclusion or exclusion of all values less than afirst knot value). This can occur if a first knot value is equal to thetechnical minimum value and the technical value is also the minimumvalue of the universal set (U) that can be represented by the technicaldata type (T). In the case that those conditions are true, the set {u inU|u<min(T)} would be the empty set. Thus, rather than having multipleset objects represent an empty set differently, normalizing a firstpartition set to an empty set may be advantageous.

For example, were U would be the (infinite) set of all integer numbersand T the set of all 4-byte integer numbers, the minimum of T may be−2^(Λ)31(−2147483648). For a set S such that {u in U|u>=−2147483648}, Swould not be the universal set, although there are no values in T whichare not in S. Thus, an ambiguity may occur and it might be desirable todistinguish S from the universal set. As an example in a system usingone-dimensional set objects, such as the one-dimensional set objectsdiscussed above, an unnormalized representation of S may be defined by aknot array of size 1 containing the minimum technical value(−2147483648) and the partition bit array (0, 1, 1) of size 3.Normalizing the first element of the set object need not involvemodifying this representation so the collection of objects defined bythe set object may be normalized simply by ensuring the representationhas a certain degree of uniqueness.

As another example, were U to be a set of all natural numbers (including‘0’) and T a set of all unsigned 1-byte integers, the minimum of T maybe ‘0’ which is also the minimum of U. In this example, the set S mightbe defined such that {u in U|u=>0} is equal to the universal set. Thus,an unnormalized representation of S might be defined by a knot array (0)of size 1, which contains the minimum technical value and a partitionbit array (0,1,1) of size 3. The first partition bit might represent theset {u in U|u<0}, which is equal to the empty set. Thus, the firstpartition bit might be redundant. To resolve the ambiguity, a partitionbit could be set as the same value as the second partition bit.Following the example, this would result in a partition bit array (1, 1,1). According to the standard normalization conditions, a redundant knotcan be indicated and that redundancy might be removed. Were the onlyknot value to be removed, an empty knot value array might remain and abit array (1) of size 1 might result. That could represent the universalset, such that, the example would be normalized to the universal set.

Referring back to the flow chart 1100, normalizing a partition set of apartition vector (1154) involves determining whether a first knot valuek[1] is a minimum technical value (1103). The minimum “technical value”refers to the minimum value representable by a computer system, or,rather, chosen to be supported for a data type that corresponds to aknot value. For example, if a knot value is of a data type of4-byte-integer, the minimum technical value supported might be−2147483648.

If the first element is a minimum technical value, a determination ismade as to whether a minimum value of the universal set is the same asthe minimum knot value (e.g., the first knot value k[1]) (1104), thefirst partition set P[1] is set to the second partition set P[2] (inorder to resolve ambiguity because no values below the minimum canexist) (1105).

If the first knot value is not a minimum technical value or the firstknot value is not the same as the minimum value of the universal set,the set object is not modified, and the set object is considerednormalized with respect to an ambiguity that can occur with the firstelement of the set object.

Normalizing a last partition set (1156) involves similar sub-processes.Normalizing the last partition set involves determining whether the lastknot value k[n] is the maximum technical value (e.g. the highest4-byte-integer) (1106), and determining whether a maximum value existsin the universal set and the last knot value k[n] is equal to thatmaximum value (1107). If either of those determinations has a negativeresult (e.g., “no”), the set object is not changed and the lastpartition set is considered normalized. If both of those determinationshas a positive result (e.g., “yes”), the last partition set P[2n+1] isset to the penultimate partition set P[2n] (in order to resolveambiguity because no values above the maximum can exist) (1108).

Normalizing redundant links (1158), involves iterating through the setobject and removing redundant links. The set object is iterated throughby initializing the index i (1109), ensuring the index is less than themaximum size of the knot array (1110), and incrementing the index(1115). In each iteration, a determination is made as to whether a valuein the universal set exists between two consecutive knot values (1111).If there is a value, the sub-process of iterating continues (1115,1110). Otherwise, a determination is made as to whether two consecutiveknot values are in the collection of objects defined by the set object(1112) (e.g., if they have a same partition set).

If the knot values are in the in the collection of objects, thesucceeding partition set is set to the current partition set (1113),which can ensure that a link reflects that two knot values areconsecutive. Otherwise, the succeeding partition set is set to the emptyset (1114), which can ensure that a link does not exist betweenconsecutive knot values (e.g., where the number two is in a set ofintegers and the number three is not in that set of integers, thereshould not be a link between the knot values because no number can existbetween the those numbers).

The standard normalization discussed above assumes that, for every twoknot values, there always exists an element of U which is locatedbetween the two knot values, even if the second knot value is the directsuccessor of a first knot vector according to the technical data type.However, if no element of U exists between two successive knot values,the link bit between two knots can be considered redundant (as it is aredundant representation of the consecutive nature of the knot values).Ambiguity may be removed by deriving a link value from two surroundingpartition entries: If the surrounding partition entries are both equal,the link would be set to either of them; otherwise, the link entry wouldbe set to the empty set.

As an example (using one-dimensional set objects; e.g., in accordancewith the substitution of partition entries with partition bits, and thelike discussed above and the example one-dimensional set objectsdiscussed above), were U to be the set of all integer numbers and T theset of all 4-byte integers, a representation of the set S {u in U|u>2and u<=3} could be defined by the knot array (2, 3) of size 2 and thepartition bit array (0, 0, 1, 1, 0). In that example, the thirdpartition bit 1 can describe the link between the first and the secondknot value. Because there exists no value in U which belongs to thislink, this partition bit is considered redundant. Thus, a normalized bitvalue may be derived from the two surrounding partition bitscorresponding to values above and below the knot value. If thesurrounding partition bits are both equal to 1, the link bit will bealso set to 1; otherwise, the link will be set to 0. In the example,this can lead to the partition bit array (0, 0, 0, 1, 0). According tothe standard normalization condition, the first knot value is redundantsuch that the knot value can be removed. Thus, standard normalizationwould represent the collection of objects as a knot array (3) of size 1and a partition bit array (0, 1, 0). This is also the representation ofthe set {u in U|u=3} which is equal (at least for integer numbers) to S.

The remaining sub-processes (1116 through 1149) in the flow chart 1100are performed during another iteration through the set object. Some ofthe remaining sub-processes relate to iterating through the loop (1116,1117, 1120) and some relate to returning a normalized set object (1118,1119). In general, the remaining sub processes in the flow chart 1100normalize the representation of intervals of a collection of objects.Intervals refer to open intervals and closed intervals, whetherhalf-closed/half-open or not. As examples, the relationshipsgreater-than, less-than, greater-or-equal-to, and less-than-or-equal-to(e.g., ‘>’, ‘<’, ‘≧’, ‘≦’), denote intervals.

In general, intervals are normalized in the flow chart 1100 bysubstituting right or left half-open intervals with right or lefthalf-closed intervals, respectively (e.g., for a universal set objectincluding all integers, modifying a set object representing theexpression x≧3 to represent the expression x>2). This process may reduceambiguity between representations of intervals (half-open and openintervals) for an uninterrupted sequence of consecutively orderedobjects (e.g., consecutively ordered integers). For index valuescorresponding to a knot value (except for an index value equivalent tothe size n of the knot array k), the sub-processes of normalizingintervals in the flow chart 1100 involve removing redundant knot values(1117, 1120, 1121), processing a preceding partition set (1162),processing a current partition set (1164), and processing a succeedingpartition set (1166), where the partition sets are preceding, current,and succeeding with respect to the partition set corresponding to theindex value (e.g., for the knot vector k[n], where i is the index, thepartition set for P[2i] would be the current partition set).

Normalizing a set object may also be important to improve theextensibility of a system (e.g., the ability to upgrade data to a newdata type that offers a greater range of values and ensure results ofoperations are consistent across upgraded and non-upgraded data). Forexample, for a dimension D that is built with a decimal data type withfive digits and two decimal places, the technical minimum and maximumvalues may be −999.99 and +999.99, respectively. If a set object isgenerated based on an expression d>=−999.99, and that set object is notnormalized according to the extended normalization, that set objectcould include all values of the data type. Thus, the generated setobject may have the same representation as a universal set object.However, if the dimension were to be extended such that the number ofdigits were increased (e.g., to six decimals), ambiguity between therepresentations of the generated set object and the universal set objectmight be resolved. Because of the inconsistent ambiguity acrossdifferent data types, inconsistent results could be computed whenperforming operations on set objects across the same data (e.g.,different results could be computed when operations are performed on thefive digit set object and the six digit set object). However, were theset objects to be normalized in accordance with the extendednormalization, the set object in the five digit data type might not havean equivalent representation as a universal set object, such thattransferring the same data to a system using the six decimal data typecould have representations that allow for consistent results whenoperations are performed on those set objects. Thus, normalization canhave a relationship to the extensibility of a system. As an example, aset object representing an open interval defined by an expression d>0.00and d<0.01 might not include any value of a data type (e.g., a decimaldata type having two decimal places such that no value exists between0.00 and 0.01). However, if the data type were to be extended such thatthe number of decimals were increased (e.g., from two decimal places tothree decimal places), the set object could be uniquely represented withrespect to an empty set object (e.g., as the set object would include avalue, such as 0.001).

Redundant knot values are removed by determining whether a currentpartition set is the same as a preceding and succeeding partition set(1121). If that is the case, the index value is increased (1117), whicheffectively drops a corresponding knot value from the normalized setobject that is returned (1119). Otherwise, the knot value is notconsidered redundant, and the process of normalizing the current knotvalue continues with processing the preceding partition set (1123).

If the preceding partition set is equivalent to the empty set, theprocess of normalizing the set object continues with processing thecurrent partition set (1164). Otherwise, a determination is made as towhether a predecessor of the current knot value can be calculated(1131). such that a preceding knot value could be represented by a datatype corresponding to the knot value. If the value can be calculated,the preceding partition set is added to a temporary partition vector(1124), and a determination is made as to whether the precedingpartition set in combination with the current partition set represent aright-open interval (e.g., being a half-open interval that is part of ahalf-open or open interval with the right being open) (1125). If thepreceding partition set in combination with the current partition setrepresent a right-open interval, the right-open interval is converted toa right-closed interval (1126, 1127, 1128, 1129, 1130); otherwise, thepreceding partition set is considered normalized with respect to openintervals, so the process continues with processing the current knot(1136). A right-open interval is converted to a right closed interval bydetermining whether the predecessor of the current knot value (e.g.,predecessor in the universal set object) is the same as the previousknot value that is to be in the normalized set object (e.g., k[i−1], or,k[i−2] if the previous knot value was not to be included in thenormalized set object) (1126); and, collapsing the partition sets forthe current knot value if the predecessor of the current knot value isthe same (1128), or, otherwise, converting the value that precedes thecurrent knot value in the normalized set object (1127). Also, as part ofconverting a right-open interval to a right-closed interval, if acurrent partition set is not the empty set, the empty set is inserted inthe normalized set object that is to be generated (1129, 1130), toensure a buffer between closed intervals.

Processing the current knot value (1164) involves normalizing the setobject such that the current knot value is to be included if, thecurrent value corresponds to an isolated value with empty preceding andsucceeding links (e.g., x=4), the current knot value is the upperboundary of a right-closed interval, or the current knot value is thelower boundary of a left-closed interval. To do this, the current knotvalue is retained if it is non-empty (1136, 1137); otherwise, thecurrent knot value is effectively removed by not including it in thenormalized set object that is to be generated (1136, 1138), whicheffectively removes a knot that is an upper bound of a right-openinterval or a lower bound of a left-closed interval. Then, if thecurrent partition set is not equivalent to either the preceding orsucceeding partition sets, the current partition set is retained, as thecurrent knot is an isolated value (1138, 1139).

Processing of the succeeding partition set (1166) is similar toprocessing the preceding partition set (1162). If the succeedingpartition set is determined to be an empty set, the process of iteratingthrough the set object continues (1140, 1117). Otherwise, there is acheck to determine whether the knot value is less than the technicalmaximum value of the data type of the knot value (1141) (such that asuccessor of the knot value can be calculated).

If the knot value is not less than the technical maximum, and thecurrent partition set is not the empty set or equal to the succeedingpartition set, the current partition set is retained (1142, 1144). Ifthe knot value is not less than the technical maximum, and the partitionset is the empty set, the current knot value and the correspondingpartition set are retained (1143, 1150). If the knot value is not lessthan the technical maximum, the partition set is not the empty set, andthe partition set is equivalent to the succeeding partition set, thepartition set is retained (1143, 1142).

Were the current knot value to be less than the technical maximum, acheck is made to determine whether the current and succeeding partitionset represent a left-open interval (1145). If the combination of thecurrent and succeeding partition set represent a left-open interval, theleft-open interval is converted to a left-closed interval by retaining asucceeding knot value of the universal set instead of the current knotvalue of the set object (1146), and, inserting the empty set into thepartition sets if the current partition set is not the empty set (1147,1148). In either the case that the current and succeeding partition setsrepresented a left-open interval or a left-closed interval, thesucceeding partition set is retained (1149).

When process of normalizing the intervals has iterated through the setobject (by going through all but the last element of the set object),the last partition set is retained (1118) and a normalized set object isreturned (1119).

As discussed above, an altered version of the flow chart 1100 could beused to generate a normalized one-dimensional set object. In any case,the flow chart 1100 could be more generally described as involving,normalizing a minimum value of the set object (1152), normalizing amaximum value of the set object (1154), normalizingconsecutively-ordered elements of the set object (1158), and normalizingintervals in the set object (1162, 1164, 1166). In alternativeimplementations, the sub-processes of how the different types ofnormalizations are performed can vary.

Although the flow chart 1100 uses terminology that appears limited tomulti-dimensional set objects in accordance with the models discussedabove, that need not be the case. For example, a minimum value of a setobject need not be a knot value, and could be any minimum number of adata type of the set object that is used to represent an expression of acollection of objects.

Normalizing a set object need not be limited to the standard andextended normalizations discussed above. Also, the “standard”normalization processes need not be default normalizations that areperformed, and the extended normalizations need not be normalizationprocesses performed in addition to any other normalization processes. Tonormalize set objects any combination of processes can be used. Forexample, normalization of a set object that is implemented using a floatdata type might involve normalizing the first and last values, yet, theintervals need not be normalized. Also, normalizing need not be limitedto normalizing a set object with integer values. As examples, the valuesmay be calendar dates or names of persons that are ordered according toan inherent rank. However, the data types that can be normalized usingthe extended normalization processes might be limited. For example,floats and strings might not be suitable for the extended normalizationprocesses. The normalization might be applicable to only certain setobjects. For example, for a Cartesian Product set object thenormalization process of the flow chart 1100 might have no influence.

The normalization process in the flow chart 1100 should be anormalization process that it is idempotent, which means, that if theresult of the process were used as input for a second normalization,there would be no changes to the result (e.g., a set object might changeduring a first normalization, but, the normalized set object would notbe modified if the same normalization process were performed on thenormalized set object). The normalization of a set object according tothe process of the flow chart 1100 might be performed after each time aset object is created. In this fashion, it can be ensured that a setobject will always be normalized. Also, to ensure an appropriateprogramming paradigm, the normalization may occur before a constructorfor a specific set type is called, as, for example, a normalized setobject might change from a multi-dimensional set object with values tothe empty set or the universal set.

Basic Set Operations on a Set Object

Basic set operations on a set object include complement, union,intersection, equality and containedness. To provide for basic setoperations, some fundamental properties may be required of a set. Forexample, this may include being able to determine the dimension order ofa set. As described earlier, the dimension information of aone-dimensional set may be retrieved from a set object with a methoddim( ) providing the possibility to determine the dimension order of twoone-dimensional sets (i.e., checking A.dim( )<B.dim( ), if A and B aretwo one-dimensional set objects).

Complement

If a one-dimensional set is defined, as described earlier, to include apartition set collection represented by a partition bit array (or bitvector), the complement of a one-dimensional set can be calculated bycalculating the bit-complement of the partition bit array (or bitvector) (i.e., a “0” becomes a “1”, and vice versa). This operationshould not violate the standard normalization property of a set (i.e.,if a one-dimensional set is normalized its complement is alsonormalized).

For a multi-dimensional set defined as combination of empty sets,universal sets, union sets, Cartesian Product sets, and one-dimensionalsets, where the one-dimensional sets are represented by a partition bitarray (or bit vector), the complement can be calculated in accordancewith the following.

The complement of a Cartesian Product set C can be calculated by usingDe Morgan's laws: C.complement( ):=C.D.complement().unite(C.F.complement( )). In other words, the complement of a datastructure representing a Cartesian Product can be calculated iterativelyas the union of (1) the complement of the first Cartesian Factor D and(2) the complement of the second Cartesian Factor F. For example,because the Cartesian Factor D may always be a one-dimensional set, thecomplement of D may be calculated as defined above, for partition bitarrays (or vectors). If the Cartesian Factor F is a one-dimensional set,the Cartesian Factor F can be calculated likewise, otherwise, it may becalculated by nested calculations using the techniques for calculatingthe complement of a Cartesian Product, a one-dimensional set, and/orunion set.

FIG. 6 is a flowchart of a process of calculating the complement of aCartesian Product set object. A Cartesian Product set object is a datastructure representing a Cartesian Product set that follows the modeldescribed above. The data structure includes an array of knot values anda partition array of references to objects.

At 605, a Cartesian Product set object A is received. For example, theCartesian Product set object may be received as the result of a functioncall asking for the complement of a Cartesian Product set object A. TheCartesian Product set object A is a normalized Cartesian Product setobject defined in accordance with the model described earlier. At 610,various variables are defined and/or initialized for the processes to beperformed, including (1) defining a temporary array Q of set objects ofthe same size as the partition bit array of the first Cartesian Factor(which is a one-dimensional set defining the dimension of the CartesianProduct set) of the Cartesian Product set object A, (2) defining anindex variable i for Q and initializing it as “0”, and (3) defining atemporary variable C that is a set object.

At 615, a complement of the second Cartesian Factor of the CartesianProduct set object A is calculated, and the resulting set object isstored in the set object variable C. This may lead to a nested executionof complement operations that is finite because a dimension level isincreased by at least 1 with each nested execution, according to thenormalization conditions for Cartesian Product set objects. In otherwords, the dimension order is defined from 1 to x, where x is the lastdimension, and the dimension order increases by at least one for eachnested level. Thus, calculating the complement of a Cartesian Factor maylead to nested calculations for set objects that represent thatCartesian Factor.

At 620 a determination is made as to whether the index variable i isless than the size of the partition bit array of the first CartesianFactor of the Cartesian Product set object A. If the result of thatdetermination is “no”, at 625 a new union set object is created that hasthe knot value array of the first Cartesian Factor of set object A andthe partition set array of set objects of Q, and the new set object isreturned as a result to the process of FIG. 10. Because the firstCartesian Factor is always a one-dimensional set, the knot array fromthe first Cartesian Factor and the array Q, in combination, represent anormalized representation of union set object.

Referring back to 620, if the result from the determination was “yes”,the index variable i is increased by 1, at 630. At 635, a determinationsis made as to whether the bit value at position i of the partition bitarray of the first Cartesian Factor of set object A is equal to 0. Ifthe result is no, the element at position i of array Q is set to the setobject of variable C. If the result of the determination is “yes”, theelement at position i of array Q is set to the universal set object. Theprocess returns to 620 from the processes at 640 and 645, therebyallowing the process to iterate across the Cartesian Product set objectA.

A union set can be calculated using the following model. With thedefinitions Π_(i):=SΠ(i) for i=1 . . . 2n+1 and P_(i):=S.P(i) for i=1 .. . 2n+1, where S.P(i) and S.Π(i) are defined as defined earlier and nrepresents the size of the union set, a union set S can be expressedsimilar to a dimension set as:${S = {{\overset{{2n} + 1}{\bigcup\limits_{i = 1}}\Pi_{i}}\bigcap P_{i}^{.}}},$

where the sets Π_(i) form a partitioning of the universal set U.

Calculating the complement of this expression, according to De Morgan'slaws, may be according to the expression:$\overset{\_}{S} = {\overset{{2n} + 1}{\bigcup\limits_{i = 1}}{\left( {\Pi_{i}\bigcap{\overset{\_}{P}}_{i}} \right).}}$

This is because, according to De Morgan's laws the complement of a unionset is the union of the complement of all partitions and partitionvalues, such that,$\overset{\_}{S} = {\overset{{2n} + 1}{\bigcap\limits_{i = 1}}{\left( {\overset{\_}{\Pi_{i}}\bigcup{\overset{\_}{P}}_{i}} \right).}}$

Using the identity U={overscore (Π_(i))}∪Π_(i) that expression can beexpressed as:$\overset{\_}{S} = {{\overset{{2n} + 1}{\bigcap\limits_{i = 1}}\left( {\overset{\_}{\Pi_{i}}\bigcup U\bigcup{\overset{\_}{P}}_{i}} \right)} = {{\overset{{2n} + 1}{\bigcap\limits_{i = 1}}\left( {\overset{\_}{\Pi_{i}}\bigcup{\left( {\overset{\_}{\Pi_{i}}\bigcup\Pi_{i}} \right)\bigcap{\overset{\_}{P}}_{i}}} \right)} = {\overset{{2n} + 1}{\bigcap\limits_{i = 1}}{\left( {\overset{\_}{\Pi_{i}}\bigcup{\overset{\_}{\Pi_{i}}\bigcap{\overset{\_}{P}}_{i}}\bigcup{\Pi_{i}\bigcap{\overset{\_}{P}}_{i}}} \right).}}}}$

Using the absorption rule A=A∪(A∩B) this expression can be simplifiedto:$\overset{\_}{S} = {\overset{{2n} + 1}{\bigcap\limits_{i = 1}}{\left( {\overset{\_}{\Pi_{i}}\bigcup{\Pi_{i}\bigcap{\overset{\_}{P}}_{i}}} \right).}}$

Because the sets Π_(i) form a partitioning of the universal set U, termslike Π_(i)∩{overscore (P_(i))}∩Π_(j)∩{overscore (P_(j))} will vanish fori≠j. Thus, a complete factorization of the above term will only retainthe terms:${\overset{\_}{S} = {{\overset{{2n} + 1}{\bigcap\limits_{i = 1}}\overset{\_}{\Pi_{i}}}\bigcup{\overset{{2n} + 1}{\bigcup\limits_{i = 1}}\left( {{\underset{{j \neq i}\quad}{\overset{{2n} + 1}{\bigcap\limits_{j = 1}}}\overset{\_}{\Pi_{j}}}\bigcap\left( {\Pi_{i}\bigcap{\overset{\_}{P}}_{i}} \right)} \right)}}},$

where the first term of this expression evaluates to the empty setbecause it is the complement of the partitioning condition$U = {\overset{{2n} + 1}{\bigcap\limits_{i = 1}}{\Pi_{i}.}}$As per the second term, the identity$\Pi_{i} = {\overset{{2n} + 1}{\underset{j \neq 1}{\bigcap\limits_{j = 1}}}\overset{\_}{\Pi_{j}}}$

can be used to simplify the whole expression to:$\overset{\_}{S} = {\overset{{2n} + 1}{\bigcap\limits_{i = 1}}{\left( {\Pi_{i}\bigcap\overset{\_}{P_{i}}} \right).}}$

Provided that the original union set is already normalized as defined byin the description of normalization conditions for union sets, either anormalized Cartesian Product set or a normalized union set should be theresult of the complement of a union set.

FIG. 7 is a flowchart of a process of calculating the complement of aunion set object (i.e., a data structure representing a union set). Themulti-dimensional model followed for the union set follows themulti-dimensional model described above. The data structure of the unionset includes a partition set array and an array of knot values. Othertypes of sets in the multi-dimensional model also represent partitionsas arrays and collections of knot values as arrays.

At 705 a union set object A is received. For example, the union setobject maybe received as the result of a function call asking for thecomplement of a union set object. The union set object is a normalizedunion set object defined in accordance with the model described earlier.

At 710, variables are defined. These variables include (1) a temporaryset array Q of set objects of the same size as the partition set arrayof the union set object A, (2) a temporary bit array B of the same sizeas the partition set array of the union set object A, (3) an indexvariable i initialized to 0, (4) a temporary variable S for a set objectinitialized with the universal set object, and (5) a temporary variableD for a one-dimensional set object.

At 715 a determination is made as to whether the index variable i isless than the size of the partition set array of the union set object A.In other words, the determination determines whether the bounds of thepartition bit array are exceeded by the running index value i. Thedetermination may be made, for example, by calling a function returningthe size of a partition set array of the union set object

If the result of the determination at 715 is “no”, the process continuesat 720. At 720, a determination is made as to whether the set object Sis equal to the empty set object (which is a check of a normalizationcondition (N5) for multi-dimensional sets, defined above). If the setobject S is the empty set object, at 725 a new union set object iscreated and that union set includes the same dimension and the same knotvalue array as the union set object A, and a partition set array that isthe same as the set array Q. The new union set is returned as a resultof the process illustrated in FIG. 7. Otherwise, a new one-dimensionalset object D is created, having the same dimension and the same knotarray as the union set object A, and the bit array B as partition bitarray (730). Then, at 735, a Cartesian Product set object is createdfrom the one-dimensional set D (as a first Cartesian Factor) and the setobject C (as a second Cartesian Factor), and the Cartesian Product setobject is returned as the result of the process of FIG. 7.

Referring to the processes at 715, if the determination resulted in“yes”, the index variable i is incremented by 1 at 740. At 745 adetermination is made as to whether the set object S is the empty setobject. If the set object S is the empty set, at 770 the set object atposition i of the set array Q is set to the complement of the set objectat position i of the partition set array of set object A. Otherwise, at750 a determination is made as to whether the set object at position iof the partition set array of set object A is equal to the universal setobject. For example, because the union set objects may include an arrayof references as the partition set array, this process may includedetermining whether a partition set array at the index i refers to theuniversal set.

If, at 750, the result is “yes”, at 785 the complement of position i iscalculated. This calculation includes (1) setting the bit at position iof the bit array B to 0, and (2) setting the set object at position i ofthe set array Q to the empty set object.

If, at 750, the result is “no”, at 755 a determination is made as towhether the set object S is the universal set object. If the set objectS is the universal set object, at 790 the set object S is set to the setobject at position i of the partition set array of set object A. Inother words, the set object S takes the set of the current set referredto by index i of set object A. At 780, the bit at position i of the bitarray B is set to 1, and the set object at position i of the set array Qis set to the complement of the set object S.

If, at 755, the result was “no”, at 760 a determination is made as towhether the set object at position i of the partition set array of setobject A is equal to the set object S. If the result of thisdetermination is “yes”, the process continues at 780, as describedabove. Otherwise, at 765 the set object S is set to the empty setobject. At 770 the set object at position i of the set array Q is set tothe complement of the set object at position i of the partition setarray of set object A. From the processes described at 780, 785, and770, the process of FIG. 7 continues in an iterative fashion, at 715.

Intersection

To perform the intersection of two one-dimensional sets, the two setsshould belong to the same dimension. FIGS. 8A and 8B include a flowchartof a process of calculating an intersection of two one-dimensional setobjects. The figures are logically connected by reference letters Athrough E. The intersection is calculated for a set object A and a setobject B, and the two set objects represent normalized sets according tothe normalization conditions described earlier for one-dimensionalobjects. In general, the process involves traversing knot values, andpartition sets for each range of knot values, of the set objects A and Bto see if an intersection of the two set objects lies in similar rangesof values.

At 805, two set objects, A and B, are received. Temporary variables(i.e., variables that are used for the scope of the process defined inFIGS. 8A and 8B) are defined and initialized at 810. The processes at810 include, (1) defining temporary index variables ia and ib for theknot array of each set object A and B, respectively, and initialize themwith 1; (2) defining a temporary knot vector KV of size 0, an indexvariable ik with initial value 0 for the knot vector KV, and a temporaryknot variable k; and, (3) defining a temporary bit vector BV of size 1,and two temporary bit variables bk (representing a knot bit) and bl(representing a link bit).

At 815, the bit-and of a first bit in a bit array of set object A and afirst bit of a bit array of set object B is calculated, and the resultis stored at a first position of bit vector BV. At 820, a determinationis made as to whether both index values ia and ib are greater than thenumber of knot values for each set. In other words, sizes of the knotarrays of each set are compared to ensure that at least one index willbe within the size of the corresponding knot array of the respective setobject A or B. If the determination results in a “yes”, at 825 adetermination is made as to whether the size of the temporary knotvector KV is 0 (i.e., ik=0). If the size is not 0, at 830, a newone-dimensional set object is created with the temporary knot vector KVbeing the collection of knot values for that set object and thetemporary bit vector BV being the partition bit vector for that setobject, and the new set object is returned as a result. Otherwise, at835, the empty set is returned as a result.

Referring back to the processes at 820, if the determination results ina “no”, a determination is made at 840 as to whether the index ibexceeds the size of set object B (i.e., the amount of knots in the setobject B), or if the current knot value for the set object A (i.e.,value stored at index ia of the knot array of set object A) is less thanthe current knot value for the set object B.

If the result of the process at 840 is a “no”, at 845 a determination ismade as to whether the index ia exceeds the (knot) size of the setobject A, or whether the current knot .value for the set object A isgreater than the current knot value for the set object B. If the knotvalue indexed by ia is greater than the knot value indexed by ib or thecurrent knot value for the set object A is greater than the current knotvalue for the set object B, at 850, (1) the temporary knot variable k isset as the current knot value of B (i.e., the lower knot value from theprevious comparison); (2) the bit variable bk is set to the bit-and ofthe bit value at index 2*ia−1 (i.e., 2 times ia, minus 1) of bit arrayof set object A and the bit value at index 2*ib (i.e., 2 times ib) ofthe bit array of set object B; and (3) the index variable ib isincremented by 1.

If the result of the process at 845 is a “no”, at 855 a series ofprocesses are performed. Those processes include, (1) setting thetemporary knot variable k as the current knot value of either set objectA or set object B (both knot values are equal in this case, so it shouldnot matter which knot value is used); (2) setting temporary bit variablebk to the bit-and of the bit value at index 2*ia of the bit array of setobject A and the bit value at index 2*ib of the bit array of set objectB; and, (3) incrementing indexes ia and ib of both set objects by 1.

Referring back to 840, if the index ib exceeded the (knot) size of theset object B or if the current knot value for the set object A was lessthan the current knot value for the set object B, at 860 a series ofprocesses are performed. These processes include, (1) setting thetemporary knot variable k as the current knot value of A (i.e., thelower knot value from the previous comparison); (2) setting the bitvariable bk to the bit-and of the bit value at position 2*ia of bitvector of set object A and the bit value at position 2*ib−1 of the bitvector of set object B; and (3) incrementing the index variable ia by 1.

At 865, the bit variable bl is set to the bit-and of the bit value atindex 2*ia−1 of the bit array of set object A and the bit value at index2*ib−1 of the bit array of set object B. At 870, a determination is madeas to whether the temporary knot value stored in variable k is redundantwith the representation of the previous knot value (i.e., whether thebit value at position 2*ik+1 of the temporary bit vector BV equal to bkand bl). This determination allows redundant knot values to beeffectively removed from the result of the intersection of set objects Aand B.

If the temporary knot value was not redundant, at 875 the temporary knotvalue is added to the temporary set object through a series of processesthat include, (1) increasing the size of the temporary knot vector KV by1 and the size of the temporary bit vector BV by 2, and incrementing theindex variable ik by 1; and, (2) setting the knot value at position ikof the knot vector KV to k and set the bit values at positions 2*ik and2*ik+1 of the bit vector BV to bk and bl, respectively.

To calculate the intersection of two multi-dimensional sets, eachmulti-dimensional set should exist in the same multi-dimensionaluniversal set U. A generalized version of the processes discussed inreference to FIGS. 8A and 8B can be used to calculate the intersectionof two multi-dimensional set objects. The process can rely on thepostulation that there is a given order of the dimensions, that everymulti-dimensional set—and in particular either of the operands of theintersection—has an assigned dimension, that the dimension of a firstoperand is of less or equal order than the dimension of a second operand(if the latter is not true, the operands for the intersection maybeinterchanged according to the commutation property of the operation),and finally, that set components to which a multi-dimensional set objectrefers (e.g., the second Cartesian factor of a Cartesian Product set andall partition sets of a union set) are assigned to a higher dimensionthan the set itself. In that postulation, operations are performediteratively from a dimension of a lower level to dimensions of a higherlevel (i.e., from 1 to x). In that postulation, the empty set and theuniversal set should always have the highest level (i.e., highest orderdimension). Thus, calculating the intersection of two multi-dimensionalsets may involve the nested intersection of all subsets to be calculatedalong an order of dimensions.

FIGS. 9A through 9C include a flowchart for calculating the intersectionof two multi-dimensional set objects. As can be observed from comparingFIGS. 8A and 8B, the process of FIGS. 9A through 9C can be derived fromthe process of FIGS. 8A and 8B. Some of the differences between theprocess of FIGS. 8A though 8B and the process of FIGS. 9A through 9C,involve an addition of “shortcuts.” For example, the process ofcalculating an intersection of a multi-dimensional set involves anadditional series of checks performed at 911, 913, and 915 to see ifeither set is the empty set, or the universal set. If either set is theempty set, the empty set is returned at 912. If either set is theuniversal set, the other set is returned. For example if set object Arepresents the universal set, set object B is returned at 914 as theintersection of set objects A and B. This series of checks avoids havingto go further down the process.

The process of FIG. 9 involves traversing across the dimensions order(i.e., dimension level) of dimension set objects for dimensions that arenot of equal dimension order, and traversing the knot values of setobjects and the partition sets for those knot values, of dimensions ofset objects that are of the same dimension order.

At 920 and 930 a determination is made as to whether the dimension orderof set object A and set object B are equivalent. If the dimension orderis equivalent, at 940, 955, 950, 960, 965, 970, 975, 980, 985, 990, and995 knot values and partition sets of two sets of multi-dimensional setobjects that have the same dimension order are traversed (e.g., adimension company of a set object A and of a set object B aretraversed). Otherwise, at 935, 945-948, and at 950, the dimension orderof set objects are traversed (i.e., the dimension order is traversed tofind matching dimensions across two set objects). As part of traversingthe dimension orders, the process of FIGS. 9A through 9C may be calledfor nested calculation at 946.

The process of FIGS. 9A through 9C generates a union set according tonormalization conditions (N1) and (N2) for a union set. The process mayfurther ensure that the returned set also fulfills the normalizationconditions (N3) to (N5), for example, by ensuring that the union set isproperly generated.

Additional and/or different processes may be added to the process ofFIGS. 9A through 9C. These processes may handle special cases. Forexample, the intersection of two one-dimensional sets having the samedimension may be performed according to the process of FIGS. 8A and 8B.As another example, the intersection of a one-dimensional set and anarbitrary set which belongs to a higher dimension is just a CartesianProduct set having the one-dimensional set as the first Cartesian Factorand the other set as the second Cartesian Factor. Thus, the process ofFIGS. 9A through 9C may be modified accordingly.

As another example, the intersection of a Cartesian Product set as thefirst operand (i.e., set object A) with an arbitrary set as the secondoperand (i.e., set object B) that is of a higher dimension may be aspecial case. In that case, the result can be the empty set if theintersection of the second Cartesian Factor of the first operand (i.e.,the Cartesian Factor F of set object A, A.F) with the second operand(i.e., set object B) is the empty set. Otherwise, the result can be aCartesian Product set having the same first Cartesian Factor as thefirst operand (i.e., the Cartesian Factor D of set object A, A.D), andthe second Cartesian Factor as the intersection of the second CartesianFactor of the first operand (i.e., the Cartesian Factor F of the setobject A, A.F) with the second operand (i.e., the set object B).

Union

Calculating the union of two one-dimensional sets can directly bederived from the algorithm for the intersection given before by makingusage of the identity${A\bigcup B} = {\overset{\_}{\overset{\_}{A}\bigcap\overset{\_}{B}}.}$

Using this identity (based in general on the duality of the twooperators in Boolean Algebra), the replacements (marked in italics inFIGS. 8A and 8B) (R1) and (R2) can be made to derive a method ofgenerating the union of two sets. The replacements include:

(R1) Replace (bit-) and-operations with (bit-) or-operations, and

(R2) Replace the empty set with the universal set.

As per calculating the union of multi-dimensional sets, calculation ofthe union of two multi-dimensional set objects is complementary tocalculating the intersection, as described herein. The process ofcalculating the union can be derived by performing the replacements (R1)and (R2) to the process illustrated in FIGS. 9A through 9C (appropriatetext in italics):

(R1) Replace operation intersect with unite.

(R2) Replace the empty set object with the universal set object, andvice versa.

Equality

For the set operations described earlier, it may be necessary to checkif two set objects are equal (i.e., having the same normalizedrepresentation and same values) or not. For two typically trivial cases,the empty set and the universal set, a comparison of the identity (i.e.,comparison of object references) might be sufficient. Other sets thatmay require a more involved check of equality include theone-dimensional set, the Cartesian Product set, and the union set. In adata structure representation for each type of set, a method isEqual( )may be defined differently to implement the differences of the checks ofequality.

FIGS. 10A, 10B, and 10C include flowcharts illustrating processes forperforming an equality check on a one-dimensional set object, aCartesian Product set object, and a union set object, respectively. Theprocesses illustrated in FIGS. 10A, 10B, and 10C are optimized so thatthey may be efficient processes of performing an equality check. Each ofthe figures include similar sets of processes, thus the three figureswill be discussed together.

At 1001-1003, 1020-1022, or 1040-1042, two set objects A and B arereceived, a temporary variables are defined, and a check is made todetermine if the set objects reference the same object (e.g., checkingto see if the two objects are at the same location in memory). If theset objects reference the same object, the value true is returned at1004, 1023, or 1043, representing that the set objects A and B areequal.

At 1005, 1024, or 1044, a determination is made as to whether the setobject B is single-dimensional set object, Cartesian Product set object,or a union set object, respectively, by determining if the set object Bcan be casted to a temporary variable of the respective data type (i.e.,determine if the set object B is of the appropriate data type). If theresult is “no”, then the set objects are not equal and the value falseis returned at 1006, 1025, or 1045. Otherwise, the set object B iscasted to the temporary variable at 1007, 1026, or 1046.

At 1008-1016, 1027-1037, or 1047-1051, the set objects A and B (asrepresented by a temporary variable) are evaluated against each other.At 1008-1016, the evaluation involves determining if the two objectsbelong to the same dimension, determining if the size of both setobjects is equal, determining if the knot arrays contain the same knotvalues, and determining if the partition bit arrays contain the samevalues. If all of these determinations result in “yes”, true is returnedat 1016; otherwise, false is returned (1009, 1011, 1013, or 1015).

At 1027-1037, the evaluation involves determining if the two objectsbelong to the same dimension, determining if the size of both setobjects is equal, determining if the knot arrays contain the same knotvalues, and iteratively checking all referenced objects of the union setobjects to determine if the referenced set objects represent the samevalues. This process may involve calling any of the processesillustrated in FIGS. 10A-10C, to determine if nested set objects areequal. If all the referenced set objects represent the same values, trueis returned at 1034; otherwise, false is returned (1028, 1030, 1032, or1037).

At 1047-1051, the evaluation involves determining whether the firstCartesian Factor of set object A is the same as the first CartesianFactor of set object B, and determining if the second Cartesian Factorof set object A is the same as the second Cartesian Factor of set objectB. These processes may involve calling any of the processes illustratedin FIGS. 10A-10C, to determine if nested set objects are equal. If thecorresponding Cartesian Factors are equal, true is returned at 1051;otherwise, false is returned (1048 or 1050).

Containedness

In order to check if a certain element c of a set C_(d) ⊂U_(d) iscontained in a one-dimensional set S of size n (i.e., n=S.size( ), wheren is the size of the set representation), a binary search can be used tofind the minimum index m ε [1 . . . n] having c≦S.t[m]. If the minimumdoes not exist, c>S.t[n] is true and m is set to n+1. Then, a comparisonof partition bits can be performed to determine if an element is in theset S. If m≦nΛC=S.t[m], the result is equivalent to S.p[2m]=1. Otherwisethe result is equivalent to S.p[2m−1]1. The effort to do this operationis just of order O(log n).

In the case of a multi-dimensional universal set U, the check ofcontainedness has to be extended from a single scalar value as input toa tuple as input. If U is the Cartesian Product of n one-dimensionaluniversal sets, the elements of U can be described as n-tuples where acomponent at the i-th position of a tuple is an element of the universalset of the i-th dimension. The check of containedness that is performedcan depend on the object type. The following techniques (C1) through(C4) may be implemented to perform a check of containedness.

(C1) The empty set returns false and the universal set returns true as aresult.

(C2) For the dimension set (e.g., DimensionSet 310 discussed earlier;i.e., a one-dimensional set object), the process of performing the checkof containedness is generally the same as described above for aone-dimensional universal set U. The process may differ, such that,prior to a check of containedness for a tuple in a one-dimensional set,the corresponding component (i.e., if the dimension set is assigned tothe i-th dimension the value at the i-th position of the tuple has to betaken) is projected out of the tuple from which it originated.

(C3) For the Cartesian Product set, checking for containedness may bedelegated to the Cartesian Factors (i.e., the two components C.D andC.F). Then, boolean results from the check of containedness of each ofthose Cartesian Factors can be combined by an “and” operation, andreturned as the result of the Cartesian Product set (i.e., the tuplemust exist in each Cartesian Factor of a Cartesian Product set).

(C4) The check for the union set is similar to the check for thedimension set described at (C2). The appropriate component of a tuple isdetermined according to the union set's dimension and the value v of thecomponent is used to find the minimum m with ν≦U.t[m]. If such a minimumdoes not exit ν>U.t[n] is true and m is set to n+1 with n=U.size( ). Ifm≦nΛv=U.t[m] the check is delegated to the (knot) partition set U.P[2m]otherwise a check is performed for the (link) partition set U.P[2m−1].The corresponding result of the delegated check can be returned as theresult of the union set.

Each check of containedness reduces the level of complexity of theproblem at least by one, by removing a dimension order. Thus, a finalresult will be generated after n checks where n is the number ofdimensions of the universal set U. In that scenario, each individual“dimension” check will take asymptotically O(log(m)) operations where mis the complexity (i.e., the size of a corresponding knot value array)of a set object on which a check of containedness is being performed.

Application to Data Aging

Many of today's enterprise applications and services are based onrelational data models and are using relational databases to store their(consistent) states persistently. Relational data models are based onmathematical set theory. Set objects could be used to improve theefficiency of a relational data model.

Set-based conditions using dimensions that are meaningful for queries,and are related to the age of data that is frequently used, could beused to define a series of archive partitions. A data store is alwayspartitioned in an online partition O_(n) and an archive partition A_(n).Typically, more time is needed to access data from the archive partitionthan the online partition. Both partitions are at any time identified bytwo complementary set conditions (as it is the nature of partitioning).At the beginning of a life of a system including an online and archivepartition, an online partition O₀ contains all the data and is describedby the universal set; whereas the archive partition A₀ is described bythe empty set and contains no data.

An archiving process could be defined by a series of disjoint sets R_(n)that define a series of archiving requests that change the partitionstate of the whole data store. All data that is selected by a requestset R_(n) can be consistently moved from an online partition to anarchive partition. In that scenario, the new set 0, describing theonline partition, after an n-th archiving request, can be defined as anintersection of the previous online set and the complement of therequest set; i.e., $O_{n}:={O_{n - 1}\bigcap{\overset{\_}{R_{n}}.}}$

The new set describing the archive partition can be defined as the unionof the previous archive set with the request set; i.e.,A _(n) :=A _(n−1) ∪R _(n)

Thus, a new online partition set can be defined as the complement of thenew archive partition set. If a query is to access data transparentlyfrom a whole data store (including frequently used data in an onlinestore and less frequently used data in an archive, such as near-linestorage), a selection condition of the query or the query's navigationstep can be converted to a multi-dimensional set object Q. Calculatingthe intersection between Q and the current archive partition set A, caneither result in an empty set or a non-empty set. In the first case, thearchive partition will not contribute to the query's result set andaccess to the archive (which is typically significantly slower thanaccess to the online partition) can be avoided a priori. In the lattercase access to the archive partition cannot be avoided and the overallresponse time of the query depends on the indexing of the archivepartition. Thus, overall response may be improved by optimizing thearchiving process such that the first case frequently occurs. Theoverall result may be reduced access to typically slower, lessfrequently used storage.

Application to Multi-Dimensional Planning

Typical planning and budgeting processes read a certain subset ofrecords according to a selection condition from a database table, modifythese records, add new records within the specified selection condition,delete some records, and write the result back to the database such thata whole subset of selected records is replaced. In a multi-userenvironment the data range that can be manipulated is generallyexclusively locked for a single user or process in order to avoid datainconsistencies. In some instances, it might not be sufficient to lockselected records, and a whole selected data range might be locked.

Using set objects in planning and budgeting applications may provide anexact multi-dimensional locking of data ranges, thus minimizing theresources that are locked, which may improve overall performance byavoiding bottlenecks associated with locked resources. The process ofusing set objects may be roughly explained as follows.

All currently locked data ranges can be stored as a multi-dimensionalset (called a lock set) by a central lock service. On system startup thelock set can be set as the empty set (i.e., nothing is locked).

If a process (or user) is to create new records or change existingrecords for a certain non-empty data range (e.g., expressed by aSQL-like condition for key columns of the affected database table)(e.g., if a process is to require locked records), the process (or user)can pass a condition indicating the data range (or the records to belocked) to the central lock service.

At the central lock service, a condition passed from a process or usercan be converted to a multi-dimensional set representation and thatintersection of the multi-dimensional set representation and the currentlock set can be calculated (i.e., the data to be locked can becalculated as an intersection of the current lock set and the passedcondition). If the intersection is not the empty set, a lock requestmust be refused (i.e., there is common data so common access should notbe allowed). If the intersection is the empty set (i.e., data is notcommon and is available for locking), the central lock service can storea union of the passed set and the current lock set as a new lock set(i.e., the lock set is updated). In addition, the service may store thecomplement of the passed set in a directory of open lock requests andreturn a unique handle to a lock requestor (i.e., a process or userrequesting to lock data). The lock requestor, which may be the lockholder (if a lock is not refused) can keep the handle for unlocking thedata range.

To unlock a data range, a lock holder can pass a handle to a centrallock service. In response, the central lock service can identify a setcorresponding to the handle. The intersection of this set (which is thecomplement of the original passed condition) and the current lock setcan be calculated and stored as the new lock set. In addition, thehandle and its associated set can be removed from the directory of openlock requests.

To improve reliability, each of the actions from converting a conditionto a set to creating the new lock set when a data range is unlocked, canbe exclusive in the system.

Application to Authorization

Today's implementations for authorization-checks define authorizationspositively. Authorizations can be (inclusively) added to a referencedauthorization profile but cannot be (exclusively) subtracted from it.

If a role authorization was expressed on a multi-dimensional set, it maybe possible to derive user authorizations from the role authorization bysimply including (union) and/or excluding (intersection with thecomplement) a user-specific authorization set from a referenced roleauthorization set.

For example, a multi-dimensional set object may represent rolescorresponding to resources. In that example, a first dimension mayrepresent available roles and a second dimension may representresources. If a user requests access to a resource, a check ofcontainedness can be performed for the role corresponding to the userand the resource. If the combination of the role and the resource, as atuple, are included in the multi-dimensional set object, the user may begranted access; otherwise, the user may be denied access.

The disclosed subject matter and all of the functional operationsdescribed herein can be implemented in digital electronic circuitry, orin computer software, firmware, or hardware, including the structuralmeans disclosed in this specification and structural equivalentsthereof, or in combinations of them. The disclosed subject matter can beimplemented as one or more computer program products, i.e., one or morecomputer programs tangibly embodied in an information carrier, e.g., ina machine-readable storage device or in a propagated signal, forexecution by, or to control the operation of, data processing apparatus,e.g., a programmable processor, a computer, or multiple computers. Acomputer program (also known as a program, software, softwareapplication, or code) can be written in any form of programminglanguage, including compiled or interpreted languages, and it can bedeployed in any form, including as a stand-alone program or as a module,component, subroutine, or other unit suitable for use in a computingenvironment.

One type of programming language, known as an object orientedprogramming language, may use classes to define data structures. A classdefines the members of an object. Objects are reusable softwarecomponents. Each object is an instance of a class. Members of a classmay include methods, variables, and references. Methods, also known asprocedures, functions, and the like, include a series of statements thatare compiled and/or executed by a processor and/or virtual machine.Methods may generate a return value, also known as output. Methods canuse mechanisms and techniques other than return values to produceoutput, including mechanisms that cause information to be written to afile, displayed on a display device, or sent over a network. Methods areinvoked by a function call. A function call specifies the method nameand may provide arguments that a called method can manipulate.Constructors are a special type of method that initializes an objectand/or generates an instance of an object. Variables, also known asparameters, attributes, and the like, can be assigned a value. Variablesmay be constant, such that the assigned value need not change during theexecution of a program, or dynamic, such that the assigned value maychange during the execution of a program. Variables can be of any datatype, including character, integer, float, packed integer, anduser-defined class. Variables can also be in the form of areference-type variable, known as a pointer. A reference need not be avariable, and can be used to reference a variable. In other programminglanguages, or types of programming languages, programming constructsother than a class may represent data structures.

A computer program does not necessarily correspond to a file. A programcan be stored in a portion of a file that holds other programs or data,in a single file dedicated to the program in question, or in multiplecoordinated files (e.g., files that store one or more modules,sub-programs, or portions of code). A computer program can be deployedto be executed on one computer or on multiple computers at one site ordistributed across multiple sites and interconnected by a communicationnetwork.

The processes and logic flows described herein, including the methodsteps of the disclosed subject matter, can be performed by one or moreprogrammable processors executing one or more computer programs toperform functions of the disclosed subject matter by operating on inputdata and generating output. The processes and logic flows can also beperformed by, and apparatus of the disclosed subject matter can beimplemented as, special purpose logic circuitry, e.g., an FPGA (fieldprogrammable gate array) or an ASIC (application-specific integratedcircuit).

Processors suitable for the execution of a computer program include, byway of example, both general and special purpose microprocessors, andany one or more processors of any kind of digital computer. Generally, aprocessor will receive instructions and data from a read-only memory ora random access memory or both. The essential elements of a computer area processor for executing instructions and one or more memory devicesfor storing instructions and data. Generally, a computer will alsoinclude, or be operatively coupled to receive data from or transfer datato, or both, one or more mass storage devices for storing data, e.g.,magnetic, magneto-optical disks, or optical disks. Information carrierssuitable for embodying computer program instructions and data includeall forms of non-volatile memory, including by way of examplesemiconductor memory devices, e.g., EPROM, EEPROM, and flash memorydevices; magnetic disks, e.g., internal hard disks or removable disks;magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor andthe memory can be supplemented by, or incorporated in special purposelogic circuitry.

To provide for interaction with a user, the disclosed subject matter canbe implemented on a computer having a display device, e.g., a CRT(cathode ray tube) or LCD (liquid crystal display) monitor, fordisplaying information to the user and a keyboard and a pointing device,e.g., a mouse or a trackball, by which the user can provide input to thecomputer. Other kinds of devices can be used to provide for interactionwith a user as well; for example, feedback provided to the user can beany form of sensory feedback, e.g., visual feedback, auditory feedback,or tactile feedback; and input from the user can be received in anyform, including acoustic, speech, or tactile input.

The disclosed subject matter can be implemented in a computing systemthat includes a back-end component (e.g., a data server), a middlewarecomponent (e.g., an application server), or a front-end component (e.g.,a client computer having a graphical user interface or a Web browserthrough which a user can interact with an implementation of thedisclosed subject matter), or any combination of such back-end,middleware, and front-end components. The components of the system canbe interconnected by any form or medium of digital data communication,e.g., a communication network. Examples of communication networksinclude a local area network (“LAN”) and a wide area network (“WAN”),e.g., the Internet.

The computing system can include clients and servers. A client andserver are generally remote from each other and typically interactthrough a communication network. The relationship of client and serverarises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other.

Although the methods of FIGS. 6, 7, 8A-8B, 9A-9C, and 10A-10C as beingcomposed of certain processes, additional, fewer, and/or differentprocesses can be used instead. For example, the processes of FIGS. 10Athrough 10C may be improved by calculating a hash value for everymulti-dimensional set (e.g., a practically unique Secure Hash Algorithmv. 1.0 (i.e., SHA1) hash value). This may improve performance related todetecting whether two set objects are or are not equal as two hashvalues may be the same for two multi-dimensional sets that represent thesame collection of objects, thus a multi-dimensional set need not betraversed and/or compared against another multi-dimensional set eachtime the equality of two multi-dimensional sets is calculated.Similarly, the processes need not be performed in the order depicted.

Although a few implementations have been described in detail above,other modifications are possible. For example, although a certainrepresentation of sets as set objects has been illustrated,modifications can be made to the representations. As another example,although bit vectors and bit arrays are used throughout the descriptionto describe the data structure that may represent set objects, similarstructures may be used to store partition sets. For example, the bitdata type of a programming need not be used and a vector of integers maybe used where a scheme for interpreting those integers appropriately maybe devised. Other implementations may be within the scope of thefollowing claims.

1. A computer program product, tangibly embodied in an informationcarrier, the computer program product comprising instructions operativeto cause data processing apparatus to: normalize a first minimum value,a first maximum value, or both the first minimum and maximum values of afirst set object in accordance with a first process, wherein: the firstminimum value is normalized based on a second minimum value of auniversal set object corresponding to the first set object, the secondminimum value being both a minimum value being supported by a data typeand a minimum value defined to be in the universal set object; and thefirst maximum value is normalized based on a second maximum value of theuniversal set object, the second maximum value being both a maximumvalue being supported by a data type and defined to be in the universalset object; and perform a set operation on a normalized version of thefirst set object to generate a result, wherein, the normalized versionof the first set object has the first minimum value, the first maximumvalue, or both the first minimum and maximum values normalized inaccordance with the first process.
 2. The computer program product ofclaim 1, wherein the first minimum value of the first set object ismodified if the first minimum value is the same as the second minimumvalue.
 3. The computer program product of claim 2, wherein the first setobject uses a combination of knot elements and partition entries torepresent a collection of objects, the first minimum value is a minimumknot element of the first set object, and the instructions to modify thefirst minimum value comprise instructions to modify the first set objectto include a value of a second partition entry in a first partitionentry, the first and second partition entries being in an orderedsequence of partition entries with the first partition entry beingbefore the second partition entry.
 4. The computer program product ofclaim 3, wherein the partition entry is a partition bit.
 5. The computerprogram product of claim 1, wherein the first set object uses acombination of knot elements and partition entries to represent acollection of objects, the first maximum value is a maximum knot elementof the first set object, and the first maximum value of the first setobject is modified if the first maximum value is the same as the secondmaximum value.
 6. The computer program product of claim 5, wherein theinstructions to modify the first maximum value comprise instructions tomodify the first set object to include the value of a penultimatepartition entry in a last partition entry.
 7. The computer programproduct of claim 1, wherein the normalized version of the first setobject is further normalized in accordance with a second process, and,the instructions are further operative to: normalizeconsecutively-ordered elements of the first set object in accordancewith the second process, wherein a first element is in a first orderedsequence before the second element and the universal set object includesthe first and second elements as an uninterrupted second orderedsequence of elements.
 8. The computer program product of claim 1,wherein the normalized version of the first set object is furthernormalized in accordance with a second process and a third process, and,the instructions are further operative to: normalize one or morerepresentations of one or more intervals in the first set object inaccordance with the second process, the intervals representing a span ofobjects in the first set object; and normalize consecutively-orderedelements of the first set object in accordance with the third process,wherein a first element is in a first ordered sequence before the secondelement and the universal set object includes the first and secondelements as an uninterrupted second ordered sequence of elements.
 9. Thecomputer program product of claim 1, wherein the normalized version ofthe first set object is further normalized in accordance with a secondprocess, and, the instructions are further operative to: normalize oneor more representations of one or more intervals in a first set objectin accordance with the second process, the intervals representing a spanof objects in the first set object.
 10. A computer program product,tangibly embodied in an information carrier, the computer programproduct comprising instructions operative to cause data processingapparatus to: normalize consecutively-ordered elements of a first setobject in accordance with a first process, wherein a first element is ina first ordered sequence before the second element and a universal setobject corresponding to the first set object includes the first andsecond elements as an uninterrupted second ordered sequence of elements;and perform a set operation on a normalized version of the first setobject to generate a result, wherein the normalized version of the firstset object has consecutively-ordered elements that are normalized inaccordance with the first process.
 11. The computer program product ofclaim 10, wherein normalizing consecutively-ordered knot elements of thefirst set object comprises removing one of the first element or thesecond element.
 12. The computer program product of claim 10, whereinthe first and second elements are knot elements that have correspondingpartition entries, and removing one of the first element or the secondelement comprises removing the first element and a correspondingpartition entry representing inclusion of the first element in the firstset object.
 13. The computer program product of claim 10, wherein thenormalized version of the first set object is further normalized inaccordance with a second process, and, the instructions are furtheroperative to: normalize one or more representations of one or moreintervals in a first set object in accordance with the second process,the intervals representing a span of objects in the first set object.14. A computer program product, tangibly embodied in an informationcarrier, the computer program product comprising instructions operativeto cause data processing apparatus to: normalize one or morerepresentations of one or more intervals in a first set object inaccordance with a first process, the intervals representing a span ofobjects in the first set object; and perform a set operation on anormalized version of the first set object to generate a result, whereinthe normalized version of the first set object has the representationsof intervals normalized in accordance with the first process.
 15. Thecomputer program product of claim 14, wherein the intervals arenormalized to generate a single representation of a same span of objectsacross different set objects, and normalizing the representationscomprises replacing a half-open interval with an equivalent half-closedinterval.
 16. The computer program product of claim 15, whereinreplacing a half-open interval with an equivalent half-closed intervalcomprises removing a knot element and corresponding partition entry ofthe first set object.